Closed Bug 1639801 Opened 5 years ago Closed 4 years ago

DigiCert: Failure to revoke key-compromised certificates within 24 hours

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mpalmer, Assigned: brenda.bernal)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Attachments

(3 files)

Steps to reproduce:

Between 2020-04-25 23:02:40 and 2020-05-02 11:33:53 (all times UTC), three certificate problem reports were sent to revoke@digicert.com, stating that a private key had been compromised, and requesting revocation of all certificates issued by Digicert using the specified SPKI be revoked. The URL of a CSR attesting to the compromise of the private key, signed by the compromised private key, was provided in each case.

The delivery time, SPKI, and MX server (with IP address) for each report are as follows:

2020-04-25 23:02:40 5f28438da8bd0c1ff34d78aff34d4c08c303744c9e63c1743ce43447a35e971e us-smtp-inbound-1.mimecast.com (205.139.110.177)
2020-05-02 11:33:50 4dea5e84b19f8ebc034cbcef01286055a6b3980a6cf385d7c46e6482280fe28d us-smtp-inbound-1.mimecast.com (207.211.30.221)
2020-05-02 11:33:53 2eaefbf90ac1d4fa80067c5d643b7ffc176ff6c23da0b66e564e4659ef5390b2 us-smtp-inbound-1.mimecast.com (207.211.30.221)

Actual results:

In each case, one or more certificates for each SPKI were not revoked within 24 hours of the certificate problem report being received (based on the revocation timestamp recorded in a validly signed OCSP response). The sent time, revocation time, and time taken to revoke are given below.

2020-04-25 23:02:40 2020-04-27 01:55:32 (1 day 02:52:51)
2020-05-02 11:33:50 2020-05-03 11:37:30 (1 day 00:03:39)
2020-05-02 11:33:53 2020-05-03 11:38:05 (1 day 00:04:11)

Expected results:

All certificates to have been revoked within 24 hours of the problem report being received.

Summary: GoDaddy: Failure to revoke key-compromised certificates within 24 hours → Digicert: Failure to revoke key-compromised certificates within 24 hours

This goes back to the fun age old question of when do we actually receive notice of key compromise - when we receive the email or when we finish the investigation of the certificate problem report.

In each case the timeline was set for 24 hours from when we confired key compromise (which I think is impressive since it's 3-4 minutes after Matt emailed us). I think all this illustrates is the need for automated revocation processes.

It doesn’t though, because we settled that age old question ages ago.

As per 4.9.5:

The period from receipt of the Certificate Problem Report or revocation-related notice to published revocation MUST NOT exceed the time frame set forth in Section 4.9.1.1.

Flags: needinfo?(jeremy.rowley)

You're right. My mistake. We did miss three cert revocations by a timeframe ranging between 2 minutes to 2 hours.

Investigating the issue, the timer didn't get appropriately set.

Flags: needinfo?(jeremy.rowley)
Assignee: bwilson → brenda.bernal
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

Here's the incident report for this:

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

07:12 21/5/2020 UTC – Matt Palmer opened this case on Bugzilla

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
    23:02 25/04/2020 UTC – Key with SPKI 5f28438da8bd0c1ff34d78aff34d4c08c303744c9e63c1743ce43447a35e971e sent to revoke@digicert.com as a key compromised.
    1:55 27/04/2020 UTC – certificates for 5f28438da8bd0c1ff34d78aff34d4c08c303744c9e63c1743ce43447a35e971e revoked
    https://crt.sh/?id=2731635583&opt=ocsp
    https://crt.sh/?id=2736343552&opt=ocsp
    11:33 02/05/2020 UTC Key with SPKI 4dea5e84b19f8ebc034cbcef01286055a6b3980a6cf385d7c46e6482280fe28d sent to revoke@digicert.com as a key compromised.
    11:37 03/05/2020 certificate for 4dea5e84b19f8ebc034cbcef01286055a6b3980a6cf385d7c46e6482280fe28d revoked
    https://crt.sh/?id=2748074788&opt=ocsp
    11:33 02/05/2020 UTC Key with SPKI 2eaefbf90ac1d4fa80067c5d643b7ffc176ff6c23da0b66e564e4659ef5390b2 sent to revoke@digicert.com as a key compromised.
    11:37 03/05/2020 certificate for 2eaefbf90ac1d4fa80067c5d643b7ffc176ff6c23da0b66e564e4659ef5390b2 revoked
    https://crt.sh/?id=2746347646&opt=ocsp
    07:12 21/5/2020 UTC – Matt Palmer opened this case on Bugzilla

  2. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
    These are not misss- issued so no changes to issuing has been made.

  3. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
    4 Certificates issued between 25th April 2020 and May 1 2020

  4. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

https://crt.sh/?id=2731635583&opt=ocsp
https://crt.sh/?id=2736343552&opt=ocsp
https://crt.sh/?id=2748074788&opt=ocsp
https://crt.sh/?id=2746347646&opt=ocsp

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
    In the past, when we received a report of a compromised key, we followed this process:
  • Confirm Report
  • Contact Customer
  • Set a reminder for us to manually make sure revocation occurred before the deadline (thus giving the customer the maximum time to replace before the certificate was revoked)

In these four cases, the deadline was missed due to the manual nature of initiating the revocation. The reminder on the time when revocation needed to happen was incorrectly set by a few minutes to a few hours depending on the human error.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

We have the system setup for reminders, but the long-term solution is to automate the entire reovcation process (as mentioned on one of the many similar bugs). Then we can get away from accepting email revocation requests manually and processing them. We currently feed the revocation into the system, and the time could be set based on when the CSR is put into the system. However, that would always result in late revocation because of the gap between receiving the email and putting the CSR through the revocatin engine. The result is the same - we need to automate revocation in order to ensure to the minute revocations. The first part of this is the blacklist check that is planned for June.

The blacklist key checker is still on track for June 30. Additional automation will be coming after, including the revocation monitoring.

The key blacklist checker is live for DigiCert. We do plan on expanding it as a service where people can upload keys, check for compromised keys and share compromised keys across multiple CAs, but that's out of scope of this particular incident. Right now, all revoked keys get added to the blacklist check when confirmed as compromised, meaning they can't be reused for TLS certs within DigiCert.

Although we could close the bug now, should we instead keep it open until smime is also using the blacklist checker?

Deferring to Ben for S/MIME, with respect to 6.2 (10) of the Mozilla Root Store Policy, v2.7

Flags: needinfo?(bwilson)

There are really two issues here: (1) timely revocation and (2) screening for compromised private keys (which appears to be complete with a blacklist system that screens for these keys prior to their inclusion in TLS certificates).
DigiCert's proposed solution for (1) appears to require additional development work to better automate revocation processes. So, as a supplement to item 7 in the incident report, I’m interested in knowing more about DigiCert's short-term/immediate plans and 60-day timeline to improve automation of revocation processes. Then, assuming that DigiCert's short-term improvements are implemented for revocation process automation, I am willing to close this bug prior to the implementation for SMIME private key checking with the understanding that it will be done along with long-term improvements to revocation processing, which are being worked on.
In addition to short-term remediations related to (1), and in regards to (2), it would be good if DigiCert could also describe an instance where the blacklist checker has now worked in a real-world situation, as an example that the new process works as intended. Thanks.

Flags: needinfo?(bwilson)

Automation of revocation will be a long term project. Once we get the blacklist checker built as a service, we are going to require that all key compromise requests go through there and require a CSR. The tool already checks the signature to confirm compromise and schedules revocation. Once that is done, we need it to automatically revoke the certs, which means tying the scheduler to the one on the administration side so they talk to each other.

Here's an example of one that it worked with. Pretty unexciting, but:

https://crt.sh/?id=2976315260

Hey Ben - do you want to close this one as well? We've started documenting how the long-term revocation project will work. It's similar to the one described on Sectigo's bug.

I intend to close this bug on or about 28-July-2020 unless additional issues or questions are presented.

Flags: needinfo?(bwilson)

Jeremy: Could you be more precise here? "described on Sectigo's bug" can refer to... well, dozens of bugs. And it'd be good to hear what DigiCert is doing, if not committing exactly 1:1 to what Sectigo is doing.

Flags: needinfo?(jeremy.rowley)
Attached image Key Compromise.png

Sorry - I should have done that in the first place. Attached are two diagrams - one describing the current plan for how we will handle key compromise and one how we will handle certificate problem reports. The two are similar flows and similar systems. The idea being that the default is towards compliance instead of a manual failing resulting in non-compliance. We are working on the key compromise stuff first followed by the cert problem reporting system. These are both subject to change as I'm sure we'll have revisions during development, and I plan on taking any lessons learned from the key compromise system and using them to enhance the cert problem report system.

Digicert is built with an API-first mentality so we are planning API and GUI access to the revocation portal. For key compromise, any entity can submit a CSR to the system and an email address. The system checks to see if the CSR does confirm control. If the CSR confirms control, the system sends an email to the address provided that affirms control and that all applicable certs will be revoked withing 24 hours. If control cannot be confirmed, an email is sent informing the provider that we can't confirm control. This would only really happen if the CSR is malformed or the key is something we'd reject anyway. The revocation engine then logs the cert in our compromised key database. The compromised key database then scans for certs with the key and schedules revocation of each of them for 22 hours from the time the compromise report is submitted. 22 hours is to account for the potential delay in the revocation appearing at the CDN. The automated search by the key database is already complete. An email is also sent to the subscriber informing them that the certificate is going to be revoked and the time the revocation will happen. at 22 hours from the email, all certificates with the key are revoked.

The certificate problem report is similar to the above except that it pings a validation agent to look at the issue and determine if the report is real. The validation agent can either approve the revocation or reject it as insufficient information. The response is a templated response that goes back to the email provided by the researcher. If approved, revocation is scheduled for 5 days from the submission. An email will also be sent to the compliance team alerting us that there was an incident with a certificate. This way we can investigate and figure out if there is an issue with the system and file a bug with Mozilla.

I realize there are more reasons for 24 hours than just key compromise. However, they are rarer than key compromise so I thought they could go through the normal revocation process. if they are a subscriber, they can always revoke the certificate through their account.

Thoughts?

Flags: needinfo?(jeremy.rowley)

How will your system process the CSR to confirm control of the private key? Doesn't the CSR have to contain a signed challenge or some language like "This key is compromised" or something that otherwise demonstrates key compromise? What are the rules for what is an acceptable CSR and what is unacceptable?

We made some minor revisions to the process as dev work has started on it.

  1. The reporter submits a key compromise to the DigiCert Key Compromise Service. At this time we are planning both API and UI support. The reporter needs to include an email address where we can send the response to the certificate problem report.
  2. The service enters a a process that confirms evidence of key compromise. If a private key is submitted, we use open source crypto libraries to perform a basic validity check. Valid keys are always assumed as compromised. If the submission is a signed string plus public key, the service verifies the public key and uses the public key to verify the signed string. The public key should be in a CSR format.
    3a. Once confirmed, the service will schedule revocation for any impacted cert for 22 hours in the future. This accounts for clock skew and OCSP/CRL caching delays.
    3b. Once confirmed, the key is added to the compromise key blocklist. This prevents any additional certs from issuing using that key.

Either way, a notice is sent to the email address of the reporter explaining the action being taken.

This is a pretty hefty project. We're currently predicting the revocation project will take until the first part of October to finish. We run a pretty standard version of scrum with two week sprints so the end dates are a bit fluid but we can share progress every two weeks.

For proof of key compromise, we will accept 1) actual private keys and 2) CSRs + a signed string. We will update the CPS when the system is ready to clearly show we require this process for key compromise.

The revocation project is still tracking as planned - as Jeremy noted above. We will continue to share updates on progress.

Still progressing with an MVP release in early Oct.

Still progressing towards the first of Oct. That will be key compromise only. Afterwards we will expand that to reasons other than key compromise.

It's currently in testing. You can see it here. https://problemreport.digicert.com/key-compromise/intro

There's instructions on what we accept as proof of key compromise. Both API and GUI access are available. We added both in case people like Matt Palmer wanted to hook into it. We think testing will be complete on Oct 12. We updated our CPS to include this as the only way we accept key compromise reports effective Oct 15th.

This is now live and working. People can submit compromised key revocation requests via the portal.

I believe that this bug can be closed, unless people have additional questions or issues to raise. Therefore, I'll schedule it to be closed on or about 23-October-2020.

There's instructions on what we accept as proof of key compromise.

All I see when I visit that URL (after enabling JavaScript) is "Hm, nothing here. The page you’re trying to reach doesn’t exist. Try again or contact us at support@digicert.com"

In any event, I'd be interested to see how you believe that the approach you've taken is in line with the spirit of Mozilla's requirements, given that previous attempts by CAs to create unnecessary barriers to problem reporting have not been well received.

Sorry - that was the wrong link for the instructions. The CPS is correct: http://problemreport.digicert.com/key-compromise. We are going to migrate all of this to problemreport.digicert.com in the next CPS update. This isn't an unnecessary barrier to problem reporting. It's automation, which is why we provide both an API and a web interface for cert problem reporting. It fits both the spirit and letter of the Mozilla requirements.

The unnecessary barrier is requiring the use of the API or web interface for key compromise reporting, with evidence required to be provided in a Digicert-specific format -- along with the requirement to agree to a Terms of Service and an 11 page privacy policy -- in order to report a problem with a certificate.

The concerns Ryan and I raised in https://bugzilla.mozilla.org/show_bug.cgi?id=1639804 also appear to apply here, and remain, as far as I am aware, unanswered by either Digicert or Sectigo. Please consider the relevant concerns from that bug included by reference.

I reviewed the Sectigo bug back when this was posted (and referenced in comment #10). I'll leave it to Ben whether we are forced to still accept key compromise reports by email now that we have this tool. We still have the email address, but we just push everything through the tool. I am not planning to accept key compromise proof that can't be fed through the tool, even if we receive an email with a link to a third party resource. Feels broken to force CAs away from automation, but there should be a universal tool to use to report key compromise so that the public isn't forced to adapt to each CA's reporting mechanism.

I reviewed the Sectigo bug back when this was posted (and referenced in comment #10)

The concerns that Ryan and I raised were posted subsequent to your posting of comment #10 in this bug.

I am not planning to accept key compromise proof that can't be fed through the tool

Which necessarily excludes any instance of key compromise that does not conform to a very narrow view of what "key compromise" can actually entail.

there should be a universal tool to use to report key compromise so that the public isn't forced to adapt to each CA's reporting mechanism.

There is one: ACME. It's incomplete, but it's a start, it's standardised, it's not hard to implement, and it's no worse than what Digicert has chosen to deploy.

(In reply to mpalmer from comment #27)

...
The concerns Ryan and I raised in https://bugzilla.mozilla.org/show_bug.cgi?id=1639804 ... remain, as far as I am aware, unanswered by ... Sectigo.

Huh? Matt, bug 1639804 is closed. Twice in that bug (c18 and c20) we said that we are once again accepting key compromise reports via email.

I have nothing more to add to this bug. I believe the system is operating in accordance with the Mozilla policy, which does not require email as the reporting mechanism for key compromise. That requirement was rejected in a previous Mozilla policy update. We request this bug be marked as resolved.

I will close this bug on or about 18-Nov-2020 unless there are more questions or issues raised.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Summary: Digicert: Failure to revoke key-compromised certificates within 24 hours → DigiCert: Failure to revoke key-compromised certificates within 24 hours
Whiteboard: [ca-compliance] → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: