Closed Bug 1521520 Opened 5 years ago Closed 5 years ago

Entrust: Late revocation of underscore certificate

Categories

(CA Program :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bruce.morton, Assigned: bruce.morton)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36

Steps to reproduce:

Entrust did not revoke all SSL certificates with an underscore character which were issued for more than 30 days before the 15 January 2019 deadline.

Actual results:

Search for all unexpired/unrevoked SSL certificates with underscore characters failed.

Expected results:

All SSL certificates with underscore characters and were issued for more than 30 days should have been revoked before 15 January 2019.

  1. How your CA first became aware of the problem

Entrust Datacard did not revoke 9 underscore certificates in accordance with the CA/Browser Forum ballot SC13 deadline of prior to 15 January 2019.

Please note that before ballot SC13 was approved, Entrust Datacard worked with certificate subscribers to migrate away from underscore certificates. Subscribers were encouraged to revoke their certificates as they were being replaced. In the final week before the deadline, Entrust Datacard revoked 116 certificates in phases on January 9, 11, and 14, 2019.

  1. A timeline of the actions your CA took in response

(All times are UTC)
January 18, 2019, 15:10 - Notification that all certificates with underscore characters have not been revoked
January 18, 2019, 16:31 - Investigation complete to determine which certificates were not revoked on time
January 18, 2019, 16:55 - Cause for error was determined
January 18, 2019, 18:53 - All certificates were revoked before this time

  1. Confirmation that your CA has stopped issuing TLS/SSL certificates with the problem

Entrust Datacard has stopped issuing certificates with underscores on December 7, 2018.

  1. A summary of the problematic certificates

The certificates listed in section 5 all have underscore characters and were issued for a validity of more than 30 days. These certificates should have been revoked before January 15, 2019.

  1. The complete certificate data for the problematic certificates

Here is the list of miss-issued certificates:

The following 5 certificates were revoked before the late revocation incident was identified.
https://crt.sh/?id=649482135
https://crt.sh/?id=806420919
https://crt.sh/?id=649481592
https://crt.sh/?id=649481807
https://crt.sh/?id=649482306

The following 4 certificates were revoked as an action to the late revocation incident.
https://crt.sh/?id=1122847425
https://crt.sh/?id=359119726
https://crt.sh/?id=737281660
https://crt.sh/?id=910289000

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The report query used to find un-revoked certificates was flawed and missed two use cases:
• Certificates that have been scheduled for delayed revocation. (8 certificates)
• Certificates that have been renewed and have not expired yet. (1 certificate)

  1. List of steps your CA is taking to resolve the situation

The correct SQL to use to find un-revoked certs has been updated, which will prevent this search error from happening again.

Bruce: thank you for reporting this. I have a few questions:

  • How was the problem detected?
  • Why were 5 of these certificates revoked before the problem was detected?
  • If a similar situation in which Entrust needs to ensure that a complete set of certificates has been identified happens in the future, will any changes be made to the process?
Assignee: wthayer → bruce.morton
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(bruce.morton)
Whiteboard: [ca-compliance]

(In reply to Wayne Thayer [:wayne] from comment #2)

Bruce: thank you for reporting this. I have a few questions:

  • How was the problem detected?
    We received notice from Netcraft that some of our certificates were not revoked.
  • Why were 5 of these certificates revoked before the problem was detected?
    We have a service where a Subscriber can replace their certificate and have the current certificate revoked at a later time. There were 5 certificates which were then revoked, before we were advised of the problem.
  • If a similar situation in which Entrust needs to ensure that a complete set of certificates has been identified happens in the future, will any changes be made to the process?
    Changes have been made to the query to avoid this issue in the future.
Flags: needinfo?(bruce.morton)

Changes have been made to the query to avoid this issue in the future.

This incident doesn't really help us understand root causes or their remediations. It's useful to understand the timeline in developing and reviewing this query, to understand, systemically, how things are being corrected going forward.

Flags: needinfo?(bruce.morton)

(In reply to Ryan Sleevi from comment #4)

Changes have been made to the query to avoid this issue in the future.

This incident doesn't really help us understand root causes or their remediations. It's useful to understand the timeline in developing and reviewing this query, to understand, systemically, how things are being corrected going forward.

Upon investigation of the issue, we found that the search query was incorrect. We immediately corrected the search query and found 9 certificates which were not found in the project to close ballot SC13. This query has been updated for future use.

Flags: needinfo?(bruce.morton)

Yes, that describes what went wrong, in about as much detail as the previous response. It does not describe how it went wrong, nor how systemic changes will prevent future wrongness, which is the goal of incident reporting. The purpose of this is to understand systemic issues and there fixes, and at present, there's no demonstration of that understanding or prevention.

Flags: needinfo?(bruce.morton)

(In reply to Ryan Sleevi from comment #6)

Yes, that describes what went wrong, in about as much detail as the previous response. It does not describe how it went wrong, nor how systemic changes will prevent future wrongness, which is the goal of incident reporting. The purpose of this is to understand systemic issues and there fixes, and at present, there's no demonstration of that understanding or prevention.

Ryan: Here is information from our development team to address your question.

A SQL query was used to find active certificates with underscores. Here “active” is a state in our database. However there are cases where certificates not in an active state may have underscores and need to be revoked. These cases were overlooked in writing the query.

Case #1: Customer has reissued an active certificate and opted to have it revoked in 30 days rather than immediately. The certificate is now in “reissued” state and was thus incorrectly missed by the query. It’s on a queue for revocation, but not actually revoked yet. If the 30 days put it after the deadline of January 15, it was not revoked in time.

Case #2: Customer has renewed an active certificate. The certificate is now in “renewed” state until it expires. But, it is not revoked. If it expires after the deadline of January 15, it should have been revoked but wasn’t.

This query was a custom query written for this one-time compliance event. It has been fixed to consider other possible states an unrevoked certificate can be in, should we have another similar compliance event in future. (The fix was to ignore certificate state and look only at revocation state.)

How to prevent next time? We will require at least one other person to review a report query for correctness especially when the report query is being used to enforce a compliance deadline.

Flags: needinfo?(bruce.morton)

Thanks for clarifying, Bruce.

It's slightly concerning that multiple reviewers weren't already part of the compliance activity, as one might expect for any CA compliance events. It's also not clear that this would have sufficiently prevented this issue - unfortunately, we can only speculate that a second set of eyes might have noticed the unnecessary restriction to only "active", as bugs happen even with code review.

One suggestion for further improvements is to cross-check your queries against public data sources, such as Certificate Transparency. While one hopes the CA's dataset contains a superset of everything in CT, we have seen, from various CAs, a variety of challenges similar to yours. For example, multiple datasets may be involved or may have been migrated in such a way as to prevent a holistic picture of issuance, whereas CT, through its "simplicity" of only storing the certificates, can provide a simpler interface.

Have you considered incorporating a process in which you cross-check your results with public data, to ensure that queries against internal datasources at least include the same degree of information present in external datasources?

Flags: needinfo?(bruce.morton)

In retrospect, we did discuss using CT to help verify the results. In fact, since Netcraft spotted the issue, we may ask Netcraft in the future to do an external check. Thanks for the feedback.

Flags: needinfo?(bruce.morton)
Flags: needinfo?(wthayer)
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(wthayer)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.