Closed Bug 1724485 Opened 3 years ago Closed 3 years ago

SecureTrust: Delayed revocation of a customer revoke request

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: andreaholland, Assigned: andreaholland)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

On August 5, SecureTrust Digital Certificate personnel were pulling revoke population data and found a customer revoke request that was revoked within 5 days instead of the required 24hrs.

https://crt.sh/?id=4993896616

A full report will be posted in the coming days.

Assignee: bwilson → aholland
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [delayed-revocation-leaf]
  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

    On August 5, SecureTrust Digital Certificate personnel were pulling revoke population data and found a customer revocation request that was revoked within 5 days instead of the required 24hrs.

  2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

    10.07.2020 12:13 PM CDT OV certificate was issued.
    10.07.2020, 7:57 PM CDT Customer downloads the certificate, begins the installation process then realizes that they have lost the private key connected to this certificate request.
    10.08.2020, 7:53 PM CDT Customer requests DV certificate to replace OV and requests revocation of the certificate.
    10.08.2020 10:57 PM CDT Support personnel escalate the revocation request.
    10.12.2020 8:53 AM CDT Certificate is revoked.

  3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

    N/A

  4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

    N/A

  5. The complete certificate data for the problematic certificates.

    https://crt.sh/?id=4993896616

  6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

    We had already addressed the issue of escalation failure by creating the separate email alias for revocation requests which was implemented on Oct 21, 2020. The error that occurred was missing this customer request in our final review of requests for bug 1667799. We failed to review the roll-forward requests, i.e. requests that came after our pull for review of bug 1667799, but before the new email alias was active.

  7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

    We will be reviewing the requests that arrived between Oct 1, 2020 and Oct 21, 2020 when our new alias became active. (Completion date mid Sept). In the future, when creating solutions for these types of incidents, we will perform at least two passes over our data, one during the initial investigation, and one after the solution has been implemented.

Thanks Andrea.

In the future, when creating solutions for these types of incidents, we will perform at least two passes over our data, one during the initial investigation, and one after the solution has been implemented.

It's important to understand: Why wasn't this already the workflow, given that a number of CA incidents stretching back a number of years have flagged the necessity of this?

I realize that, on its own merits, this might just seem like an administrative error and nothing to be concerned about, but I don't think that holds. The reason for concern here is that this sort of good practice should have been recognized for CAs following around CA incidents; certainly, the need for such thorough analysis was well-known at the time of the event (Oct 21, 2020).

Can you better describe your process for monitoring Bugzilla and m.d.s.p., and how that monitoring turns into actionable changes at the CA? What's the process like for reviewing bugs to see if there are risks or good practices to adopt, and what, if anything, can be done to improve SecureTrust's ability to meet those expectations?

Flags: needinfo?(aholland)

It's important to understand: Why wasn't this already the workflow, given that a number of CA incidents stretching back a number of years have flagged the necessity of this?

We typically perform the second pass after implementing such solutions, but as it was not a formalized practice, it was missed in this case. This step has been added to our incident management process.

Can you better describe your process for monitoring Bugzilla and m.d.s.p., and how that monitoring turns into actionable changes at the CA? What's the process like for reviewing bugs to see if there are risks or good practices to adopt, and what, if anything, can be done to improve SecureTrust's ability to meet those expectations?

SecureTrust’s practice involves a weekly meeting with the CA team to consider M.D.S.P discussions as well as Bugzilla tickets. During this meeting we discuss the existing tickets and topics, their potential impact to our business, the items that can be addressed and the improvements added to our roadmap. We generate our own compliance tickets from the Bugzillas to keep track of these items. Currently there is no form of automation applied to the tickets, but it has been discussed as a potential improvement in the future.

Flags: needinfo?(aholland)

I appreciate the quick answers, Andrea, but I think I'm still trying to get to a more systemic understanding.

We typically perform the second pass after implementing such solutions, but as it was not a formalized practice, it was missed in this case.

I think the question here remains as to why this wasn't a formalized practice, given the many issues we've seen. That is, the latter half - the description of SecureTrust's practices - seems to have repeatedly failed to identify this as a necessary practice to formalize, despite it showing up in many CA incidents. This suggests that SecureTrust's current practices, for evaluating issues, is not taking into consideration the lessons it can learn from other CAs.

This may seem like I'm giving you a hard time for a bad judgement call, but I'm trying to understand how to make sure that "We see CAs X, Y, and Z have issues when Step Foo is not part of a formalized practice" actually results in SecureTrust realizing "Oh, we should add Step Foo as a formalized process, to prevent similar issues for SecureTrust". Figuring out why the process broke down - and why SecureTrust failed to recognize this pattern as a potential risk - is key to understanding how to improve.

Ideally, SecureTrust would have looked at previous bugs to see if other CAs failed to do multiple passes, and then try to determine why SecureTrust failed to identify a similar risk to their business. Hopefully, your meeting notes would capture some of this. But I get the feeling that this would be a significant undertaking (to identify these past bugs, as well as to figure out what was discussed for them), and that strikes me as a clear area of improvement to prevent similar situations going forward.

The requests that arrived between Oct 1, 2020 and Oct 21, 2020 were reviewed with no additional incidents.

Please expect a response for the comment in the next few days.

Our weekly reviews are open ended discussions which analyze the Bugzillas for any risks to our systems, security, operations, customers, and the ecosystem as a whole. Since they are open ended discussions certain points of a bug can be focused on more than others. We are formalizing a template for each CA incident, to ensure we capture all aspects of the bug, as well as answer what, if applicable, we are doing to prevent such an incident from occurring at our CA.

We are monitoring this bug for any further questions or comments.

If there are no further questions or comments, we request this bug be closed.

Flags: needinfo?(bwilson)

I'll close this on or about Friday 10-Sept-2021.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] [delayed-revocation-leaf] → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.