Closed Bug 1627614 Opened 5 years ago Closed 5 years ago

Let's Encrypt: Failure to revoke key-compromised certificates within 24 hours

Categories

(CA Program :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mpalmer, Assigned: jaas)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36

Steps to reproduce:

Between 2020-04-05 07:51:00 and 2020-04-05 07:51:21 (all times UTC), a total of 12 revocation requests were sent to cert-problem-reports@letsencrypt.org, notifying Let's Encrypt of a total of 42 certificates and precertificates which had been issued with compromised private keys. Each of these reports included a link to a crt.sh query listing the known-impacted certificates, as well as a link to a PKCS#10 format attestation of key compromise.

Actual results:

I received auto-acknowledgement e-mails from Let's Encrypt for these reports dated between 2020-04-05 07:51:04 and 2020-04-05 07:51:25.

The revocationTime for all these certificates in the OCSP responses I am currently seeing are all either 2020-04-06 08:19:09 or 2020-04-06 08:19:10 (primarily the latter), times that are greater than 24 hours from both the time the certificate problem report was submitted and the time of the auto-acknowledgements I received.

Expected results:

BRs 4.9.1.1 requires CAs to revoke certificates within 24 hours when the CA has obtained evidence of key compromise.

Matt: The revocation time expressed in the CRL or OCSP response is it alone sufficient here. Your report does not indicate whether or not you recorded evidence of a failure to revoke.

Here is a table listing, for each certificate revoked at around that time:

  1. The CA-provided revocationTime;

  2. The last time at which The Revokinator received a validated OCSP response where certStatus=0 ("good"); and

  3. The first time at which The Revokinator received a validated OCSP response where certStatus=1 ("revoked").

    revocationtime | lastunrevokedresponse | firstrevokedresponse
    ---------------------+----------------------------+----------------------------
    2020-04-06 08:19:10 | 2020-04-06 08:16:40.762038 | 2020-04-06 08:20:21.363634
    2020-04-06 08:19:10 | 2020-04-06 08:16:42.103358 | 2020-04-06 08:20:24.731198
    2020-04-06 08:19:09 | 2020-04-06 08:16:38.208337 | 2020-04-06 08:20:10.782942
    2020-04-06 08:19:09 | 2020-04-06 08:16:40.162595 | 2020-04-06 08:20:20.060133
    2020-04-06 08:19:10 | 2020-04-06 08:16:42.203639 | 2020-04-06 08:20:24.823135
    2020-04-06 08:19:10 | 2020-04-06 08:16:40.066229 | 2020-04-06 08:20:19.538995
    2020-04-06 08:19:10 | 2020-04-06 08:16:42.011732 | 2020-04-06 08:20:24.630944
    2020-04-06 08:19:09 | 2020-04-06 08:16:38.09652 | 2020-04-06 08:20:10.107563
    2020-04-06 08:19:10 | 2020-04-06 08:16:42.31963 | 2020-04-06 08:20:24.918158
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.378991 | 2020-04-06 08:20:17.437362
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.749835 | 2020-04-06 08:20:18.792142
    2020-04-06 08:19:09 | 2020-04-06 08:16:42.41533 | 2020-04-06 08:20:25.014022
    2020-04-06 08:19:10 | 2020-04-06 08:16:41.137739 | 2020-04-06 08:20:23.199512
    2020-04-06 08:19:10 | 2020-04-06 08:16:40.949092 | 2020-04-06 08:20:22.11979
    2020-04-06 08:19:10 | 2020-04-06 08:16:41.916499 | 2020-04-06 08:20:24.535931
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.46686 | 2020-04-06 08:20:18.100656
    2020-04-06 08:19:10 | 2020-04-06 08:16:37.672196 | 2020-04-06 08:20:07.567897
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.841186 | 2020-04-06 08:20:19.194164
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.179098 | 2020-04-06 08:20:16.681526
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.970563 | 2020-04-06 08:20:19.285912
    2020-04-06 08:19:10 | 2020-04-06 08:16:37.892081 | 2020-04-06 08:20:08.492301
    2020-04-06 08:19:10 | 2020-04-06 08:16:40.857936 | 2020-04-06 08:20:21.880174
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.554435 | 2020-04-06 08:20:18.474003
    2020-04-06 08:19:10 | 2020-04-06 08:16:38.00034 | 2020-04-06 08:20:08.993678
    2020-04-06 08:19:10 | 2020-04-06 08:16:38.51691 | 2020-04-06 08:20:12.586751
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.287211 | 2020-04-06 08:20:17.200604
    2020-04-06 08:19:10 | 2020-04-06 08:16:38.341327 | 2020-04-06 08:20:11.821462
    2020-04-06 08:19:10 | 2020-04-06 08:16:38.424878 | 2020-04-06 08:20:12.341504
    2020-04-06 08:19:10 | 2020-04-06 08:16:37.766368 | 2020-04-06 08:20:07.969171
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.65476 | 2020-04-06 08:20:18.565025
    2020-04-06 08:19:10 | 2020-04-06 08:16:41.72914 | 2020-04-06 08:20:23.786257
    2020-04-06 08:19:10 | 2020-04-06 08:16:41.045493 | 2020-04-06 08:20:22.663187
    2020-04-06 08:19:10 | 2020-04-06 08:16:42.507066 | 2020-04-06 08:20:25.142946
    2020-04-06 08:19:10 | 2020-04-06 08:16:41.824195 | 2020-04-06 08:20:24.023708
    2020-04-06 08:19:10 | 2020-04-06 08:16:41.228692 | 2020-04-06 08:20:23.290947
    2020-04-06 08:19:10 | 2020-04-06 08:16:42.603717 | 2020-04-06 08:20:25.242163
    2020-04-06 08:19:10 | 2020-04-06 08:16:38.608051 | 2020-04-06 08:20:13.098002
    2020-04-06 08:19:10 | 2020-04-06 08:16:41.337058 | 2020-04-06 08:20:23.382173
    2020-04-06 08:19:10 | 2020-04-06 08:16:40.383251 | 2020-04-06 08:20:20.793199
    2020-04-06 08:19:10 | 2020-04-06 08:16:38.704899 | 2020-04-06 08:20:13.773825
    2020-04-06 08:19:10 | 2020-04-06 08:16:38.796125 | 2020-04-06 08:20:14.427767
    2020-04-06 08:19:10 | 2020-04-06 08:16:40.290732 | 2020-04-06 08:20:20.28141
    2020-04-06 08:19:10 | 2020-04-06 08:16:40.653907 | 2020-04-06 08:20:21.129899
    2020-04-06 08:19:10 | 2020-04-06 08:16:39.088019 | 2020-04-06 08:20:16.138927
    2020-04-06 08:19:10 | 2020-04-06 08:16:38.900108 | 2020-04-06 08:20:14.951427
    2020-04-06 08:19:10 | 2020-04-06 08:16:41.631862 | 2020-04-06 08:20:23.690599
    2020-04-06 08:19:10 | 2020-04-06 08:16:38.995766 | 2020-04-06 08:20:15.470681
    2020-04-06 08:19:10 | 2020-04-06 08:16:40.561811 | 2020-04-06 08:20:21.013109

In each case, as you can see, I received a "good" response from OCSP more than 24 hours after the certificate problem report was submitted, which, if nothing else, I believe violates 4.9.5's requirement for "published revocation" within 24 hours of receipt of the problem report.

While we're on 4.9.5 violations, I also did not receive a preliminary report from the CA within 24 hours of submitting the certificate problem report, as required by the first paragraph. Those problem reports were received starting from approximately 2020-04-06 09:07. I don't like to harp on those unnecessarily, because the important thing is that the certs get revoked, but I mention it in the interests of full disclosure.

Assignee: wthayer → jaas
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
QA Contact: wthayer → bwilson
Whiteboard: ca-compliance
Whiteboard: ca-compliance → [ca-compliance]

Summary
12 subsequent reports of key compromise came to our cert-problem-reports email address. Investigation, revocation and key blocking was performed 28 minutes after the 24 hour revocation deadline as outlined in 4.9.1.1.

How your CA first became aware of the problem.
12 emails were sent to our cert-problem-reports@letsencrypt.org reporting email address on or near 2020-04-05T07:51:00Z. Each email indicates a key had been compromised and contained a link to a PKCS#10 file signed using the key reported.

A timeline of the actions your CA took in response.

  • 2020-04-05T07:51:00Z: 12 emails reported to our cert-problem-reports email address containing evidence of key compromise for 12 keys.
  • 2020-04-05T19:04 Email notification to the SRE team of an unassigned ticket that had not yet been addressed.
  • 2020-04-05T21:30:00Z: Let’s Encrypt team started investigation of the batch of compromised keys reports.
  • 2020-04-06T07:51:00Z: 24 hours had elapsed since the initial report
  • 2020-04-06T08:19:00Z: 24 serials associated with the reported compromised keys were found and revoked

Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem.
Yes. At the time the reported certificates were revoked the compromised keys were also added to our compromised keys blocklist to prevent future issuance using those keys.

A summary of the problematic certificates.
A total 24 certificates were revoked as part of the aggregated email reports.

The complete certificate data for the problematic certificates.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
Our current process for handling key compromise reports is that our SRE and security teams receive a notification and begin their investigation. If an email goes unanswered for 11 hours, a secondary notification email is triggered. Only after the secondary notification came in were the initial email reports discovered and an investigation began.
We were working on tools and automation for https://bugzilla.mozilla.org/show_bug.cgi?id=1625322 when the reports came in. We decided to bundle the work from the new reports with what we were doing there for efficiency purposes. Our plan was to finish the bundled work by the deadline but miscalculated the time and missed the deadline by 28 minutes. We have now identified that as a place we needed to add additional monitoring and alerting.
Some of the tooling and automation we were working on:

  • Verification of certificate reported as compromised against the PKCS#10 bundle as evidence of private key control.
  • Search all non-revoked, non-expired certificates in our database to determine which are affected by a report of key compromise (both via our cert-problem-reports email or api revocation).
  • Automatically determine when a revocation with keyCompromise reason happens via our api and add the key to our block list.

List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

  • We previously had our cert-problem-reports ticketing system send emails to the team when a ticket had been unassigned for over 11 hours, but there was no on-call page alert associated. As a remediation we have added an on-call page in addition to this alert. In addition, we have added a 14 hour on-call page to additional team members as a backup.
  • Sped up our systems for finding all certificates that use a specific public key.
Flags: needinfo?(bwilson)

Ben: While Comment #3's response to Question 7 does not include the timeline of changes, it appears they are both stated in past-tense, meaning they're already implemented.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.