Closed Bug 1663080 Opened 1 year ago Closed 8 months ago

IdenTrust: Issuance of certificates greater than 398 days

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: roots, Assigned: roots)

Details

(Whiteboard: [ca-compliance])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36

Steps to reproduce:

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
    IdenTrust:
    As a result of discussion on ZLint GitHub (https://github.com/zmap/zlint/issues/467) related to RFC 5280 Section 4.1.2.5, we initiated an internal investigation and identified 2 mis-issued certificates that exceeded the validity of 398 days
  2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
    IdenTrust:
    9/2/2020: Initiated internal investigation to identify any related mis-issued certificates and was able to immediately revoke one of the 2 identified certificates.
    9/3/2020: Completed investigation and started working with customer to revoke/replace the second mis-issued certificate.
    9/3/2020: Updated configurations to set max validity period of 397 days.
  3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
    IdenTrust:
    Yes, configurations have been updated to issue maximum validity of 397 days from the previous setting of 398 days.
  4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
    IdenTrust:
    2 certificates issued:
    On 9/1/2020 at 13:22 expiring on 10/4/2021 at 13:22 - Total of 398 days – working on replacement/Revocation.
    On 9/2/2020 at 23:10 expiring on 10/5/2021 at 23:10 - Total of 398 days – Revoked on 9/2/2020
  5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
    IdenTrust:
    3321211927; 3328474589
  6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
    IdenTrust:
    Our interpretation of having certificates not exceeding 398 days did not contemplate the inclusive definition of validity period found in RFC5280 which caused validity period to exceed 398 days by 1 second.
  7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
    IdenTrust:
    Effective immediately we have updated the certificate profiles and system configurations reflecting a maximum validity period on end-entity SSL/TLS certificates not to exceed 397 days instead of the 398 days previously configured.
    We will post an update within 48 hours once confirmed that the second certificate has been revoked.
Type: defect → task

This certificate is now revoked: https://crt.sh/?id=3321211927

Assignee: bwilson → roots
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

Can you explain more of the review process for CA/B Forum Ballot SC31? Your remediation calls out the steps taken, which is to use 397 days, but this value was repeatedly discussed (in the CA/B Forum, in policy announcements by Apple and Google, and within SC31 itself), precisely to avoid these issues. So I think, in the context of understanding and assessing root causes, understanding what led Identrust to select 398 days versus 397, seems like a relevant root cause analysis.

Flags: needinfo?(roots)

In February 2020 when Apple announced their decision to support only 398-day max validity period for SSL/TLS certificates, we updated our CP/CPS policy documents to reflect 398 days as maximum validity period which were published on May 21, 2020 and proceeded to implement accordingly. This means we were already underway in our implementation during the CA/B Forum discussions surrounding the use of 397 days vs. 398 days due to the potential of exceeding 398 days by 1 second. Because the implementation of the validity period changes were already underway, there was oversight in implementing the change to 397 days on all different certificate profiles; the specific timing of issuance by a customer caused the one second overage on the two SSL/TLS certificates identified.

I see. The Apple policy mentioned noted the recommended value as 397, and was published on March 3, which is why I'm trying to understand more.

I suppose my general concern is that policies, whether the Mozilla Policy, the aforementioned Apple policy, or even the Baseline Requirements, tend to describe the absolute upper threshold for something, but in general, you want CAs to design to operate well-below those tolerances and thresholds. I wasn't sure what factors would lead to 398 days, versus 397, or even 365, and more generally trying to understand the design process when evaluating limits. Time limits are certainly the most obvious (e.g. the time for revocation, the lifetime of certificates), but I was hoping to understand more of the design and review methodology.

Could you share more details about the processes for reviewing and design? Is there the opportunity of similar risks, or situations that could approach deadlines in ways that might result in violations?

Our process for reviewing and designing any compliance driven updates\changes include steps of Compliance Risk Review, Security Risk Review, PKI Architecture Design, Technical Architecture Design, Process\Procedure Design, User Experience, and Customer Impact\Benefit. One of our objective in review and design is to maximize customer benefit as much as possible which for this change meant we maximized the validity on certificate as much as possible while still maintaining compliance. With this approach there is a risk in that if we overemphasize maximizing customer benefit we may inadvertently risk compliance. In order to avoid that in the future, we will need to moderate our pursuit of maximizing customer benefit when it's at cross purpose with compliance risk.

Is there something concrete that you have adopted or implemented which establishes that compliance risk takes precedence over customer benefit? Or are there other steps / procedures you have implemented as a result of this incident? In other words, what are the steps you have implemented to ensure that similar types of mis-issuances don't occur in the future? The reason I ask these questions is to ensure that there are processes in place to ensure that new requirements will be implemented in the future with sufficient "margins of error" so that compliance flags don't go off.

Yes, based on this incident report, we have updated our “CA/B Forum Monitoring and Reporting Process” document incorporating a Risk Assessment section covering the areas on our comment #5. Sponsors of system changes are required to provide documented analysis of the impact of the change including Security Risk, Policy Compliance, PKI Architectural Design, User Impact, and User Experience. Emphasis is placed on Security Risk and Policy Compliance. Items with high impact in any of these categories are passed on to the IdenTrust Policy Management Authority and/or the Risk Management Committee for final approval.

Is there a risk that sponsors of changes will misjudge the relative prior to being sent to the Risk Management Committee and/or PMA?

In particular, I'm trying to understand how the description in Comment #7 is different from self-attestation, and what controls exist to ensure that the risk management is properly considered? For example, the described process seems to lack even a basic form of secondary/independent review. More generally, it's unclear whether the system change sponsor is qualified to adequately assess that risk, especially from an adversarial viewpoint, and I'm hoping you can describe more.

Yes, there is a possibility that sponsor of changes might misjudge/mis-interpret a new approved CA/B Forum compliance issue as was the case of this incident. With the additional documented process we implemented as a result of this issue, now there are SME’s from several departments that will have an opportunity to assess the impact of new CA/B Forum compliance policy.

The updated documented process is different from self-attestation in the sense that now we added SME’s from other areas to contribute in the assessment, whereas before this evaluation was performed by a limited team.

I have no further questions. I intend to close this on or about next Wednesday, 31-March-2021.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Flags: needinfo?(roots)
Summary: IdenTrust Issuance of certificates greater than 398 days → IdenTrust: Issuance of certificates greater than 398 days
You need to log in before you can comment on or make changes to this bug.