Closed Bug 1709192 Opened 3 years ago Closed 3 years ago

IdenTrust: Unavailable CRL for IdenTrust ‘DST Root CA X3’.

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: roots, Assigned: roots)

Details

(Whiteboard: [ca-compliance] [crl-failure])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36

Steps to reproduce:

Today May 3, 2021 we learned that the CRL for our ‘DST Root CA X3’ was unavailable on April 30, 2021 which resulted in the inability to retrieve CRL information for the root potentially impacting several customers. The issue was resolved on the same day (April 30).
We are investigating further and will be posting a complete incident report no later than May 14 2021.

Assignee: bwilson → roots
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Summary: Unavailable CRL for IdenTrust ‘DST Root CA X3’. → IdenTrust: Unavailable CRL for IdenTrust ‘DST Root CA X3’.
Whiteboard: [ca-compliance]
Type: defect → task

Complete Incident Report:

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
    IdenTrust: (All times stated in UTC)
    IdenTrust Engineers first became aware of the problem from system monitoring that reported the DST Root X3 CRL had expired. The internal monitor alert was received 04/30/2021 at 19:44:12 UTC
  2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
    IdenTrust:
    4/28/2021 – 19:50 UTC, IdenTrust completed the work to generate and update the 30 day root CRL for the DST Root CA X3
    4/30/2021
    19:42, CRL Expired
    19:44 , System alerting reported an expired root CRL for DST Root CA X3
    19:55, The issue was identified as a replication issue and system engineers had begun working to resolve the problem
    20:17, System replication was resolved and verified that the CRL had synchronized across all hosts.
    5/1/2021 00:04, The cache was manually purged with the CDN and the CRL was distributed to all hosts
  3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
    IdenTrust: This problem did not impact the issuance or misconfiguration in a certificate. This problem impacted the ability to retrieve CRL information.
  4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
    IdenTrust: Not applicable
  5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
    IdenTrust: Not Applicable
  6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
    IdenTrust: When the 30 day root CRL was generated and uploaded, the newly generated CRL did not replicate immediately to all hosts. Attempts to lookup the root CRL would result in receiving an expired response.
  7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
    IdenTrust:
    Steps taken to resolve the situation:
    • The replication of the CRL to all hosts have been corrected. This was an issue that was present under specific conditions that prevented replication. Completed 4/30/2021
    • A process change has been added to verify that future CRL updates are replicated to all hosts. Completed 5/1/2021)

Steps that will take place to avoid recurrence:
• Alerting will be enhanced to notify system engineers of a CRL that will be expiring within 3 hours of expiration. This will be completed by 5/30/2021
• Monitoring will be enhanced to check against expired CRL’s on the CDN by 6/30/2021

IdenTrust folks: A reminder that https://wiki.mozilla.org/CA/Responding_To_An_Incident includes the following:

You should also provide updates at least every week giving your progress, and confirm when the remediation steps have been completed - unless Mozilla representatives agree to a different schedule by setting a “Next Update” date in the “Whiteboard” field of the bug

Flags: needinfo?(roots)

Thanks for the friendly reminder; we had planned to post this update today:
Effective May 20, 2021 we have enhanced the alerting system to notify engineers when a CRL will be expiring within 3 hours.
We are on track to implement the other monitoring enhancement to check against expired CRL's on the CDN by June 30, 2021; we will post a note here if this task happens before that date.

Whiteboard: [ca-compliance] → [ca-compliance] Next update 2021-07-01

Since June 25, 2021 we have enhanced and verified monitoring to check against expired CRLs on the the CDN.
We now consider this issue solved and made our external auditors aware for inclusion in the annual WebTrust audit report.

I will schedule this for closure on or about Friday, 2-July-2021.

Flags: needinfo?(bwilson)

• Alerting will be enhanced to notify system engineers of a CRL that will be expiring within 3 hours of expiration. This will be completed by 5/30/2021

Could you clarify which of the following this enhanced monitoring is?

1.) In the 3 hours after expiring, the engineers will receive an alert.
2.) In the 3 hours before expiring, the engineers will receive an alert.

I presume it is (2), but (1) could also be a valid explanation, and would appreciate it if this small wiggle room for interpretation was clarified.

Yes, #2, 3 hours prior to expiration.

Thanks!

From Comment #3:

We are on track to implement the other monitoring enhancement to check against expired CRL's on the CDN by June 30, 2021

From Comment #4:

Since June 25, 2021 we have enhanced and verified monitoring to check against expired CRLs on the the CDN.

I don't see any discussion about what these enhancements were. I'm not sure if it's just a parsing fail on my part, namely, the statement can be read as:

  • "We now check our CDN for expired CRLs"

Or if it's meant to be

  • "We now perform (unspecified checks) of our CRLs on our CDN"

There have been a variety of incidents in the past related to CAs monitoring their public artifacts; everything from cache corruption at the CDN (leading to stale or corrupted resources served to users) to propagation delays to the CDN, it'd be a bit surprising if Identrust is only now checking CDN CRLs.

It'd be useful if Identrust can provide a more detailed explanation about the (external) monitoring they have in place for various services, and (ideal, but not required) how they map to various BR requirements. This seems like a check that "every" CA should already have had in place, and so it's useful to better understand what checks Identrust currently has in place, to see if there may be other expected or common checks that could be missing.

The issue of the 30-day root CRL that was not immediately replicated to all hosts has been corrected as explained in the incident report.
The additional monitoring we implemented is within the IdenTrust network checking that the externally published CRL copy, hosted by the CDN, is expiring within the next 3 hours.

The additional monitoring we implemented is within the IdenTrust network checking that the externally published CRL copy, hosted by the CDN, is expiring within the next 3 hours.

Gotcha. So this just means "We now check our CDN for expired CRLs". Thanks for clarifying.

Can you share what existing checks you have on your externally-published CRL?

I'm specifically trying to make sure we have a good understanding about how Identrust is preventing the class of issues (bad artifacts being published), rather than the specific issue (an expired CRL being published).

This is the only externally-published CRL we have in place and while we have extensive monitoring in place, this is specific monitoring that was added for the externally-published CRL used with a CDN.

I appreciate the answer, but it appears the concern is not being understood, which may explain why the latest response in Comment #12 doesn't address the concerns raised in Comment #11.

For example, Comment 12 says:

while we have extensive monitoring in place

While Comment #11 was trying to get Identrust to explain what that monitoring comprises of, specifically, consistent with the statement in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Incident_Report

Additionally, they exist to help the CA community as a whole learn from potential incidents, and adopt and improve practices and controls, to better protect all CAs. Mozilla expects that the incident reports provide sufficient detail about the root cause, and the remediation, that would allow other CAs or members of the public to implement an equivalent solution.

As a concrete example, consider Apple's detailed response in Bug 1588001, Comment 21, after they had issues with their OCSP server. It provided a very clear picture about the steps in place for monitoring, and helped highlight gaps for other CAs to learn from.

Could Identrust provide more technical details about its monitoring related to CRLs published to CDN?

IdenTrust internal monitoring was added to ensure that this externally hosted root CRL is valid: http://crl.identrust.com/DSTROOTCAX3.crl.
The automated monitoring runs a validation check of the expiration date every 5 minutes and alerts the system engineers IF:
• (Warn) The CRL will expire within 72 hours.
• (Critical) The CRL will expire within 24 hours.
• (Critical) The CRL is expired
The monitor will continue to alert every 5 minutes until a new CRL is published by the CDN.

For shorter lived CRL’s the monitor is configured with a shorter window:
• (Warn) the CRL will expire within 8 hours
• (Critical) the CRL will expire within 3 hours
• (Critical) the CRL is expired
The monitor will continue to alert every 5 minutes until a new CRL is published.

These monitoring configurations are in place for both internally hosted CRL’s, and CRL’s that are hosted by the CDN.

Are there any other questions? If not, I will close this next Wed. 14-July-2021.

No further questions. I think classifying Comment #15 as extensive may be... a bit generous (as shown by Comment #13). There's obviously a lot of room for improvement, and it's clear other CAs have recognized so as well. I think it'd be deeply concerning if Identrust later has service issues, but I support closing it.

Flags: needinfo?(roots)
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] Next update 2021-07-01 → [ca-compliance] [crl-failure]
You need to log in before you can comment on or make changes to this bug.