Closed Bug 1648472 Opened 4 years ago Closed 4 years ago

Entrust: SHA-256 hash algorithm used with ECC P-384 key

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bruce.morton, Assigned: bruce.morton)

Details

(Whiteboard: [ca-compliance] [ov-misissuance])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36

Steps to reproduce:

SSL certificates were issued from an ECC CA with a P-384 key, but were signed using the SHA-256 algorithm.

Actual results:

Certificates were signed with the SHA-256 algorithm, see attached file.

Expected results:

Certificates should have been signed using SHA-384 algorithm.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

On 17 June 2020, Entrust Datacard compliance team discovered using linting software with crt.sh, that 16 SSL certificates were signed using an ECC P-384 key, but were hashed using SHA-256.

Mozilla Policy v2.7 states that if a certificate is signed using an ECC P-384 key, that it must be hashed using SHA-384. This policy was required starting with Mozilla Policy v2.4, which was effective 28 February 2017.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

17 June 2020: Issue discovered using crt.sh linting software. The problem occurs with two ECC SSL subordinate CA, which are referred to as L1F and L1J.
17 June 2020: Investigation started, where it was determined that the problem is on both L1F and L1J CAs.
19 June 2020: L1J CA scheduled for migration, but due to QA issues was delayed to 24 June 2020. Note that L1F was already correctly configured on 19 May 2020.
24 June 2020: L1J CA configured to support SHA-384 signing.

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

On 19 May 2020, L1F has been migrated to new CA software which hashes using SHA-384.
On 24 June 2020, L1J has been migrated to new CA software which hashes using SHA-384.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

L1F has 511 OV SSL certificates which have not expired or have not been revoked; the latest expiry date is 13 August 2022.
L1J has 95 EV SSL certificates which have not expired or have not been revoked; the latest expiry date is 18 September 2022.

  1. The complete certificate data for the problematic certificates.

Certificates are list in the attached file. The crt.sh link has been calculated. Note in some cases the Subscriber chose not to CT log the certificate, so the crt.sh link may fail.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The CAs were put into production in April 2016, which was before Mozilla changed their policy. It was understood that the CAs must using a version of SHA-2, and SHA-256 was chosen. When Mozilla Policy 2.4 was introduced, it was not observed that these CAs were not configured correctly.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Both CAs have been upgraded and configured to sign certificates using the SHA-384 algorithm. All SSL CAs are now configured by design to implement the Mozilla Policy.

Although the certificates have not been issued in accordance with Mozilla Policy, we are not planning to revoke these certificates. ECDSA P-384 with SHA-256 provides 128-bit security strength, which is the same security level as P-256 with SHA-256, which is allowed by the Mozilla Root Store Policy v2.7. However, some Subscribers may have thought that they were receiving 192-bit security strength since the key is P-384. As such, we will advise Subscribers of the issue and will offer certificate re-issue at no cost.

Assignee: bwilson → bruce.morton
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [delayed-revocation-leaf]

(In reply to Bruce Morton from comment #1)

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

17 June 2020: Issue discovered using crt.sh linting software. The problem occurs with two ECC SSL subordinate CA, which are referred to as L1F and L1J.

There's a significant lack of detail regarding the previous events, including the non-compliance found at other CAs, and which was discussed at Entrust.

Should we conclude, based on this timeline, that Entrust Datacard has been actively ignoring Mozilla changes, and ignoring discussion in mozilla.dev.security.policy? If that's not the case, could you please update this timeline to provide a more detailed picture of the discussions followed and steps taken?

It's probably useful to review https://wiki.mozilla.org/CA/Responding_To_An_Incident#Follow-Up_Actions , which specifically addresses this, and which this report is lacking.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The CAs were put into production in April 2016, which was before Mozilla changed their policy. It was understood that the CAs must using a version of SHA-2, and SHA-256 was chosen. When Mozilla Policy 2.4 was introduced, it was not observed that these CAs were not configured correctly.

This isn't an explanation of how and why the mistakes were made, and how they avoided detection until now. This is simply a statement about what went wrong.

I hope you'll update this to meet what Mozilla requires for incident reports?

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Although the certificates have not been issued in accordance with Mozilla Policy, we are not planning to revoke these certificates.

Could you explain how this complies with Mozilla Policy 6.1? Or was this an area that Entrust Datacard was also not familiar with the changes to in the past 3 years?

ECDSA P-384 with SHA-256 provides 128-bit security strength, which is the same security level as P-256 with SHA-256, which is allowed by the Mozilla Root Store Policy v2.7. However, some Subscribers may have thought that they were receiving 192-bit security strength since the key is P-384. As such, we will advise Subscribers of the issue and will offer certificate re-issue at no cost.

This doesn't seem to meet the level of https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

Overall, this report, while useful to identifying and disclosure of a failure by Entrust Datacard to follow the same requirements that all CAs are expected to follow, doesn't really help build confidence that the underlying root cause has been addressed, that there's sufficient detail for the community to be assured it's been fixed and for other CAs to learn from it, or to follow the same requirements that all CAs are expected to when an incident happens.

I'm hoping this was just an oversight in a rush to disclose before a CA/Browser Forum call, and that the next reply to this post will be a more suitable report that meets the expectations.

Flags: needinfo?(bruce.morton)

(In reply to Ryan Sleevi from comment #2)

(In reply to Bruce Morton from comment #1)

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

17 June 2020: Issue discovered using crt.sh linting software. The problem occurs with two ECC SSL subordinate CA, which are referred to as L1F and L1J.

There's a significant lack of detail regarding the previous events, including the non-compliance found at other CAs, and which was discussed at Entrust.

We discussed that we are using both pre-issuance linting and post-issuance linting using zlint. The zlint software used did not detect the error. We do occasionally use the online implementation used with crt.sh, which detected the problem on 17 June 2020. We did review for other bugs with the similar issue and found this one, https://bugzilla.mozilla.org/show_bug.cgi?id=1527423.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The CAs were put into production in April 2016, which was before Mozilla changed their policy. It was understood that the CAs must using a version of SHA-2, and SHA-256 was chosen. When Mozilla Policy 2.4 was introduced, it was not observed that these CAs were not configured correctly.

This isn't an explanation of how and why the mistakes were made, and how they avoided detection until now. This is simply a statement about what went wrong.

I hope you'll update this to meet what Mozilla requires for incident reports?

The certificate profile was changed to support SA-384 in April 2016, so about 10 months before the Mozilla requirement came into place. Unfortunately, the change was not implemented on either CA and the reason has not been determined. We plan to update our process to ensure that the approved certificate profiles are implemented and tested.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Although the certificates have not been issued in accordance with Mozilla Policy, we are not planning to revoke these certificates.

Could you explain how this complies with Mozilla Policy 6.1? Or was this an area that Entrust Datacard was also not familiar with the changes to in the past 3 years?

Entrust was aware of the requirement and per item 6 had previously added the requirement to the certificate profile. Entrust also reviews the Mozilla Policy upon each update and in this case, the certificate profile met the requirements. Per item 6, we will review and update our certificate profile implementation and testing process.

ECDSA P-384 with SHA-256 provides 128-bit security strength, which is the same security level as P-256 with SHA-256, which is allowed by the Mozilla Root Store Policy v2.7. However, some Subscribers may have thought that they were receiving 192-bit security strength since the key is P-384. As such, we will advise Subscribers of the issue and will offer certificate re-issue at no cost.

This doesn't seem to meet the level of https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

Agree, however we felt that the certificates as issued would protect the Subscribers and the Relying Parties through their validity period. We also reviewed bug https://bugzilla.mozilla.org/show_bug.cgi?id=1527423 and saw that revocation was not discussed or performed for this incident. We were hoping to provide our Subscribers with the same consistency.

Flags: needinfo?(bruce.morton)

(In reply to Bruce Morton from comment #3)

There's a significant lack of detail regarding the previous events, including the non-compliance found at other CAs, and which was discussed at Entrust.

We discussed that we are using both pre-issuance linting and post-issuance linting using zlint. The zlint software used did not detect the error. We do occasionally use the online implementation used with crt.sh, which detected the problem on 17 June 2020.

Perhaps I'm missing something, but I have a lot of trouble seeing how this is a useful or meaningful response. The interpretation to take away is "Entrust does not believe anything before this date is relevant", which is quite disappointing, especially given the specific call to carefully review the expectations.

We did review for other bugs with the similar issue and found this one, https://bugzilla.mozilla.org/show_bug.cgi?id=1527423.

So in your extensive review, and your ongoing following of activity of mozilla.dev.security.policy, Entrust was unaware of and/or did not perform any examination for the following:

I'm encouraged you were able to locate the DigiCert issue, but it appears no careful analysis of this issue was performed, such as the validity periods of the certificates at the time the incident was reported.

The certificate profile was changed to support SA-384 in April 2016, so about 10 months before the Mozilla requirement came into place. Unfortunately, the change was not implemented on either CA and the reason has not been determined. We plan to update our process to ensure that the approved certificate profiles are implemented and tested.

When can we expect an update on the reason determined? I think it's reasonable to be quite concerned when a CA doesn't implement a change in policy, doesn't follow discussions about violations of that policy, doesn't know why they didn't follow those discussions or makes those changes, and doesn't plan to do anything about the certificates that violated the policy. This gives a very clear impression of a CA that simply doesn't care about adhering to policy, and it's unfortunate that this reply doesn't raise to the level.

If this seems like a harsh or emotional reply, it should be realized that the CA's incident report and handling is a key determination of trust. This is the CA's opportunity to demonstrate, beyond reproach, that they are capable of being trusted, that any failures were exceptional, and have been well researched into what went wrong and what's being done to improve. The subjective evaluation of these incident reports is to avoid the alternative of using an objective process that immediately distrusts CAs, which is far from ideal. So my hope is that, using the resources previously provided about both expectations and positive examples, Entrust will step up here and really look to distinguish itself with a quality incident report, treating any failing of policy as seriously as possible.

ECDSA P-384 with SHA-256 provides 128-bit security strength, which is the same security level as P-256 with SHA-256, which is allowed by the Mozilla Root Store Policy v2.7. However, some Subscribers may have thought that they were receiving 192-bit security strength since the key is P-384. As such, we will advise Subscribers of the issue and will offer certificate re-issue at no cost.

This doesn't seem to meet the level of https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

Agree, however we felt that the certificates as issued would protect the Subscribers and the Relying Parties through their validity period. We also reviewed bug https://bugzilla.mozilla.org/show_bug.cgi?id=1527423 and saw that revocation was not discussed or performed for this incident. We were hoping to provide our Subscribers with the same consistency.

I'm deferring to Ben, as I think Entrust is being grossly negligent here, that the material facts are demonstrably different, and the result is one that inspires little faith in whether or not Entrust is capable of viewing policy appropriately. Least of all because the previous efforts to highlight what is expected of Entrust here is willingly and intentionally being ignored.

Flags: needinfo?(bruce.morton)
Flags: needinfo?(bwilson)

(In reply to Bruce Morton from comment #3)

We discussed that we are using both pre-issuance linting and post-issuance linting using zlint. The zlint software used did not detect the error. We do occasionally use the online implementation used with crt.sh, which detected the problem on 17 June 2020.

What version/commit of ZLint are you using for your linting? Do you and your engineers subscribe to the announcements mailing list (https://groups.google.com/forum/#!forum/zlint-announcements) ? I believe the v2.1.0 release from May 22nd does detect this class of error.

Just to make sure to add color about what a detailed timeline would have also included:

  • 2016-11-07: Gerv opens policy issue #5 to introduce this requirement
  • 2017-04: Mozilla sends a communication reminding CAs to carefully review the Policy 2.4 changes, and Entrust "Check here to confirm that Mozilla's CA Certificate and CCADB policies have been reviewed and any necessary changes to your CA's policies or practices will be made by June 1, 2017."
  • 2020-01: Mozilla sends a communication reminding CAs to carefully review the policy differences and discussions, which Entrust declares "We have read, understand, and intend to fully comply with version 2.7 of Mozilla’s Root Store Policy"
  • 2020-05-14: Zakir Durumeric announces a new version of ZLint and where to subscribe to announcements
  • 2020-05-22: A posting of the new version is announced on the list linked by Zakir, including a list of the new lints

I simply find it unfathomable that a response to this incident would be "it was not observed that these CAs were not configured correctly", especially after explicit attempts were made to highlight and emphasize this. It suggests that, despite warranting to the community and Mozilla that Entrust will comply, the statement is merely one of "good intentions" rather than an actual process or procedure to ensure compliance. This might have been acceptable in 2015, but it's unconscionable in 2020 to think that sort of response is appropriate.

Entrust also reviews the Mozilla Policy upon each update and in this case, the certificate profile met the requirements

This is also difficult to reconcile, given the language of 2.4.1, the incident discussion, the subsequent 2.7, and the fact that this was still missed. I cannot see how this statement is remotely true, and that's why it's so deeply troubling, because it's difficult to distinguish whether this is deception, ignorance, or apathy. Ignorance seems difficult to square away, given the statements on m.d.s.p and efforts in policy 2.7 here explicitly to make ignorance an unacceptable answer here, but the alternatives are even more troubling.

(In reply to Daniel McCarney from comment #5)

(In reply to Bruce Morton from comment #3)

We discussed that we are using both pre-issuance linting and post-issuance linting using zlint. The zlint software used did not detect the error. We do occasionally use the online implementation used with crt.sh, which detected the problem on 17 June 2020.

What version/commit of ZLint are you using for your linting? Do you and your engineers subscribe to the announcements mailing list (https://groups.google.com/forum/#!forum/zlint-announcements) ? I believe the v2.1.0 release from May 22nd does detect this class of error.

I will need to confirm which version we are using. We do have a JIRA ticket open to implement v2.1.0. I agree that this will detect the error, which was how was detected as v2.1.0 is used with crt.sh linting. We do have two developers subscribed to the mailing list, perhaps I should add myself as well.

Thanks.

Flags: needinfo?(bruce.morton)

Re-setting N-I, because the concerns from Comment #4 / Comment #6 haven't been addressed.

This incident represents at least three separate, and distinct, failures to abide by Mozilla Policy and follow developments:

  • Failure to implement the original policy
  • Failure to be aware of the Mozilla discussions about the policy
  • Failure to properly examine the systems as specifically directed, in light of the confusion a limited number of CAs had

Mitigation of this issue minimally involves revocation, but also would have to involve a comprehensive analysis for why there were repeat failures, especially when all reasonable effort was made by Mozilla to communicate and educate on the importance of these changes. That's why this is such a serious and concerning issue.

Flags: needinfo?(bruce.morton)

(In reply to Ryan Sleevi from comment #8)

Re-setting N-I, because the concerns from Comment #4 / Comment #6 haven't been addressed.

This incident represents at least three separate, and distinct, failures to abide by Mozilla Policy and follow developments:

  • Failure to implement the original policy
  • Failure to be aware of the Mozilla discussions about the policy
  • Failure to properly examine the systems as specifically directed, in light of the confusion a limited number of CAs had

Mitigation of this issue minimally involves revocation, but also would have to involve a comprehensive analysis for why there were repeat failures, especially when all reasonable effort was made by Mozilla to communicate and educate on the importance of these changes. That's why this is such a serious and concerning issue.

Here is an update to the timeline to include items before the incident. I am hoping these will help address that we thought that we already met the algorithm requirement, were aware of the Mozilla policy, tracked the Mozilla policy changes and attempted to be in compliance with the ongoing Mozilla policy.

  1. 2016-04-06, documented the correct profile for implementation. This was not based on Mozilla or BR direction.
  2. 2016-11, we were aware of Gerv's discussion started in November 2016. We were not concerned with the discussion as our CA's were SHA-384 compatible and the current documented certificate profile already met the requirements.
  3. 2017-04, we did confirm that we reviewed the changes based on Mozilla Policy 2.4 and did respond to the Mozilla survey before the deadline. We were not concerned with the algorithm requirement as we thought it was already addressed.
  4. 2018-02, implemented post-issuance linting based on cablint to help ensure that the certificates were issued in accordance with the BR and EV documents.
  5. 2018-08-15, Entrust policy authority addressed Mozilla Policy 2.6 for notification and to ensure that we met the policy or have plans to meet the policy.
  6. 2018-11-15, Entrust policy authority addressed Mozilla Policy 2.6.1.
  7. 2019-02-13, Entrust policy authority discussed where Mozilla has an issue with CA's issuing certificate with ECC P-521 keys. Entrust had already stopped using P-521.
  8. 2019-05, implemented pre-issuance linting using zlint. The zlint software was also added to post-issuance linting.
  9. 2019-05-14, Entrust policy authority discussed that Mozilla is drafting policy v2.7.
  10. 2019-08-26, Entrust policy authority discussed that Mozilla is drafting policy v2.7.
  11. 2019-11-25, Entrust policy authority discussed that Mozilla is drafting policy v2.7.
  12. 2020-01, we did declare that we intended to comply to Mozilla Policy 2.7; we were not aware that we had an issue with the signature algorithm.
  13. 2020-02-24, Entrust policy authority discussed the recently published Mozilla policy v2.7, where owners and effective dates were discussed.
  14. 2020-05-13, we were first aware that a new version of zlint would be available based on the following email, https://lists.cabforum.org/pipermail/servercert-wg/2020-May/001909.html. This email also made us aware of zlint-announcements@googlegroups.com mailing list, where 2 developers subscribed to the list.

Although in 2016 we documented the certificate profile, which met the future Mozilla requirements, we did not properly examine the systems to ensure they met the approved certificate profile specification. I believe that this is the root cause of the issue. Since we have had this error and certificate profile will be addressed in an upcoming ballot, we will plan to re-baseline our certificate profiles and update our process to ensure complete examination and testing of new and changed certificate profiles.

Flags: needinfo?(bruce.morton)

(In reply to Bruce Morton from comment #9)

  1. 2016-11, we were aware of Gerv's discussion started in November 2016. We were not concerned with the discussion as our CA's were SHA-384 compatible and the current documented certificate profile already met the requirements.
  2. 2017-04, we did confirm that we reviewed the changes based on Mozilla Policy 2.4 and did respond to the Mozilla survey before the deadline. We were not concerned with the algorithm requirement as we thought it was already addressed.
  3. 2020-01, we did declare that we intended to comply to Mozilla Policy 2.7; we were not aware that we had an issue with the signature algorithm.

Although in 2016 we documented the certificate profile, which met the future Mozilla requirements, we did not properly examine the systems to ensure they met the approved certificate profile specification. I believe that this is the root cause of the issue. Since we have had this error and certificate profile will be addressed in an upcoming ballot, we will plan to re-baseline our certificate profiles and update our process to ensure complete examination and testing of new and changed certificate profiles.

Thanks Bruce, this gets us closer to what seems like a root cause, and a recurring issue. Specifically, the policy authority thought the requirements were met, but did not seem to closely examine to actually confirm, despite reminders by Mozilla to do so, and despite discussions on m.d.s.p. highlighting issues where CAs made a mistake.

Whether or not a CA believes in good faith they are compliant doesn't really do much. That's why the CA Communications remind CAs to carefully examine their systems, and to make sure they're aware of the changes. Every change to Mozilla policy, or any other change (e.g. the Baseline Requirements) should be met by confirming both policy and implementation are correct. The use of linters like cablint and zlint is useful, but they're not a substitute for the careful analysis by CAs to make sure every change is compliant.

A remediation plan, in addition to the revocation of these misissued certificates, should be rethinking about how the policy authority manages things. For example, ensuring new tests are written for each requirement (and, as with tools like zlint/cablint, potentially open-sourcing them). Validating through code audits that code actually implements the change. Reviewing all CA incidents for issues and carefully working with both policy and engineering to make sure that issue, and any potentially related issue, are examined in both implementation and policy. These are the things we expect of CAs, so that with every change, the CA has a full trail of evidence for why they believe both implementation and policy are correct.

Again, the goal of incident reports is not to blame someone. It doesn't matter who on the policy authority was in charge of reviewing or that they believed things were correct. The goal is about ensuring the tools, processes, and procedures are robust enough to handle them making a mistake, or, in this case, three distinct and serious mistakes over a three year timeframe. Each of these actions by Mozilla (the policy discussion, the requirement to review m.d.s.p and the subsequent m.d.s.p discussion, the rewording in policy 2.7) are themselves designed to support CAs in doing that, but these secondary controls by Mozilla also failed.

Working to figure out how to manage this going forward is, I think, essential for answering Questions 6 and 7 of the incident report: root causes about how things (systemically) broke down, and steps being taken to (systemically) address those breakdowns.

Flags: needinfo?(bruce.morton)

(In reply to Ryan Sleevi from comment #10)

Working to figure out how to manage this going forward is, I think, essential for answering Questions 6 and 7 of the incident report: root causes about how things (systemically) broke down, and steps being taken to (systemically) address those breakdowns.

Update to Incident Report Items 6 and 7

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The certificate profile document was created correctly. The certificate profile was not implemented correctly and the issue was not confirmed by initial test, which would have include a human review. Mitigating tests were put into the system over time, which included pre-issuance and post-issuing linting. The linting software used Entrust developed code and third party linting software. Unfortunately, no linting software captured the problem.

The problem avoided detection due to improper testing and linting software which did not address the requirement. The problem also avoided detection as there were no errors detected with any servers or clients which used the certificates.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

The problem will be addressed as follows:

  • CAs will be updated to correct the issue - this item is COMPLETE.
  • Certificate profile implementation process will be updated to ensure all certificate profile requirements are tested or observed. This process will also be extended to include certificate profile updates. This update should be complete by 31 July 2020.
  • Entrust linting software will be updated to specify the specific hashing algorithm assigned to each CA. Timeline TBD.
  • Third party linting software will be updated to a version which will match the key size and the hashing algorithm and detect the problem. Timeline TBD.
  • Entrust will revoke all certificates. All Subscribers will be contacted. Entrust will follow up with an Incident Report on late revocation of leaf certificates.
Flags: needinfo?(bruce.morton)

(In reply to Bruce Morton from comment #11)

The certificate profile document was created correctly. The certificate profile was not implemented correctly and the issue was not confirmed by initial test, which would have include a human review.

Nothing in your process or proposals seems to address this systemic issue. Linting is not a systemic fix; it's the catch to highlight when there is a systemic failure. Why wasn't the profile implemented correctly, why wasn't it confirmed by initial test, why wasn't it detected during repeated communications from Mozilla and from the community? Those are systemic issues, and these are the questions that you should be trying to address with fixes.

The problem avoided detection due to improper testing and linting software which did not address the requirement. The problem also avoided detection as there were no errors detected with any servers or clients which used the certificates.

This isn't really useful though. Of course stuff that's forbidden could work. If you were using sequential serial numbers, for example, things could still work. If you were misencoding a DER extension, things could still work. The spectrum of "things that work, but are not valid" is as broad as the spectrum of things that are possible.

In order for this to be a meaningful answer, it means that the only thing that is used to test compliance is "does it work?" I'm sure you can understand why that would be a terrible outcome.

To understand what a good response might look like, try this sort of response on for size (I don't know how accurate it is, but it's trying to highlight the systemic thinking):

Root causes:

  • Problem: The original certificate profile was implemented incorrectly.
    • Detect: Certificate profiles will be updated to undergo a multi-party review, with at least two members from engineering and two members from policy authority ensuring things are correct.
    • Prevent: Certificate profiles can be error prone to configure, as they rely on UI configuration. Entrust will be working by XXXX-XX-XX to ensure that all certificate profile configuration is done through explicit file-based configuration.
  • Problem: Entrust Datacard assumed that the policy documentation matched implementation, and did not notice this misconfiguration.
    • Detect: Entrust will be completing a full review of all publicly-trusted certificate profiles, involving two members of engineering and two members of policy authority to ensure that policy documentation and system configuration are correct.
    • Prevent: As part of ongoing self-audits, Entrust Datacard will review a sample of at least 10% of its configured certificate profiles every X months to ensure they match as documented. The review of the certificate profiles will not involve any member of engineering or policy authority responsible for the original configuration.
  • Problem: Entrust Datacard did not review its system configuration to ensure compliance with Mozilla Policy.
    • Prevent: In addition to the ongoing self-assessments, each change to Mozilla Policy will trigger a review. Two members of the policy authority will develop a check-list of each change and how to assess it, for both positive and, where appropriate, negative tests. Two separate members, from both policy and engineering, shall review the checklists against the configured profile, to ensure things are correct.
  • Problem: Entrust Datacard did not believe that it was affected by the same issue previously discussed on m.d.s.p.
    • Detect: An e-mail alias will be subscribed to notifications to the "CA Certificates" component of NSS and m.d.s.p. Every week, the Policy Authority will review new messages and discuss all incidents from CAs that have been updated. For each incident, the Policy Authority will develop a plan to determine if Entrust may be affected by this issue, or related issue.
    • Prevent: The previously described steps with respect to configuration verification should ensure better alignment. At least every 3 months, a sampling of issues will be spot-checked against documentation and configuration as an additional layer.
  • Problem: Entrust Datacard was aware of the requirement change, but had no tools to detect compliance.
    • Mitigate: While Entrust Datacard makes use of ZLint, updates may take up to XX days in order to be qualified as correct for use within Entrust Datacard's system. To reduce that time to YY, Entrust Datacard will be ...
    • Detect: For each Policy change, Entrust Datacard will review all new requirements to examine for new lints and/or requirements. Entrust Datacard will develop new lints within ZZ days of adoption. To better assist the community in adopting these lints, Entrust Datacard will submit these upstream to the relevant linters and engage in good-faith efforts to ensure these lints are appropriately integrated upstream.

Like, that's just spitballing here, but it's trying to highlight that there are processes here designed to help get the right output, and they failed. Mistakes were made, which is human, but the CA's responsibility is to design systems that are robust against mistakes. My concern is that comments like Comment #11 are still focusing as if the issue is with these specific 16 certs, which sounds like a minor thing, but it's about the processes that led to that.

I'm glad y'all are fixing the configuration, I'm glad that y'all are updating linters. These are good things, and I don't want to lose sight of them, because they are important. But they're not systemic fixes, and that's the worry here.

Again, I'm glad y'all detected these, I'm glad you reported, and I'm glad we're making progress here, despite my strong words. However, I'm hoping the above illustrates the "systems-level" thinking we expect of CAs, and which the "good examples" of responding to incidents highlight.

Now, I don't know if everything above is appropriate, or even correct. There's definitely handwaving. But I'm hoping it provides a bit more useful of a template to think about how to address this for the root cause, so that the next policy, whether it's 2.7.1 or 2.8 or a new version of the BRs or what have you, has the sort of systemic, holistic review and remediation, and that Entrust never again misses the bar here.

Flags: needinfo?(bruce.morton)

Ryan, thank you for your input. This is very valuable and is in line with many of the steps that we were discussing internally. I plan to take your input and provide an update to this incident report within the next week or so.

Flags: needinfo?(bruce.morton)

Bruce: I'm concerned that 21 days have elapsed and Entrust has provided no updates here to an otherwise very serious issue.

Flags: needinfo?(bruce.morton)

(In reply to Ryan Sleevi from comment #14)

Bruce: I'm concerned that 21 days have elapsed and Entrust has provided no updates here to an otherwise very serious issue.

Apologize for the delay. We are taking this very seriously and have had a number of internal meetings to address the root cause issues. I am currently working on a response. Will plan to get this posted today.

Flags: needinfo?(bruce.morton)

We have reviewed the root causes from comment 12. We plan to do the following to implement certificate profiles, monitor browser policies and test for compliance.

Implementation of Certificate Profile

  • Detect: Certificate profiles will be updated to undergo a multi-party review, with at least two members from Operations and one members from Policy ensuring things are correct.
  • Prevent: Certificate profiles are configured through explicit file-based configuration.

Check and Monitor Certificate Profile for Miss-configuration

  • Detect: Perform a full review of all publicly-trusted certificate profiles, involving Operations and Policy to ensure that policy documentation and system configuration are correct. The certificates will also be reviewed by CA specific post-linting software.
  • Prevent: As part of ongoing self-audits, review a sample of at least 10% of its configured certificate profiles on a quarterly basis to ensure they match as documented. The review of the certificate profiles will be performed by Security Compliance and will not involve any member of Operations or Policy personnel responsible for the original configuration.

Ensure System Configuration meets Browser Policies

  • Prevent: In addition to the ongoing self-assessments, each change to browser policy will trigger a review. Members of Policy and Security Compliance teams will develop a check-list of each change and how to assess it, for both positive and, where appropriate, negative tests. Two separate members, from both Policy and Operations, will review the checklists against the configured profile, to ensure implementation is correct.

Monitor Industry Discussed CA issues

  • Detect: Policy member(s) will be subscribed to notifications to the "CA Certificates" component of NSS and m.d.s.p. On a weekly basis, a Policy team member will review new messages to determine if there is impact to the Entrust CA or determine if the incident should be investigated. If an incident impacts Entrust's CA implementation, then a plan will be developed to rectify the issue.

Tools to Detect Compliance

  • Mitigate: Zlint will be updated for 1) pre-issuance linting within 3 months of a Zlint release and 2) post-issuance linting software within 6 weeks of Zlint release.
  • Detect: For each change to Zlint, a minimum of 6 months of previously issued certificates will be checked to detect issues.
  • Detect: For each Policy change, Entrust Datacard will review all new requirements to examine for new lints and/or requirements. Entrust Datacard will develop new lints in post-issuance linting software for the next release or patch.

The policy and process will be refined with experience of implementation.

(In reply to Bruce Morton from comment #11)

The problem will be addressed as follows:

  • Entrust linting software will be updated to specify the specific hashing algorithm assigned to each CA. Timeline TBD.

The linting software will be updated in a release targeted for 5 August 2020.

  • Third party linting software will be updated to a version which will match the key size and the hashing algorithm and detect the problem. Timeline TBD.

For post-issuance lining, zlint will be updated in a release which is targeted for 5 August 2020.
For pre-issuance linting, zlint will be updated in a patch targeted for 31 August 2020.

Certificate profiles will be updated to undergo a multi-party review
On a weekly basis, a Policy team member will review new messages

In one response, it's identified that single-party controls can easily give rise to human error, and so multi-party controls are used. In another, a single-party control is proposed.

I mean, I appreciate that Comment #16 seems to have folded in a number of recommendations, I guess it's hard to tell whether there was any identified gaps or whether they were just taken as-is. My big concern is, going forward, we see incident reports from Entrust take a more systemic, holistic response, like Comment #16, to try and cover the scenarios, and to provide sufficient detail about the situation and its failures to understand how those relate. The goal isn't to make CAs wear proverbial sackcloth, it's to try and make sure we're understanding how things go wrong, so that we can effectively collaborate on identifying solutions to avoid that going forward.

Whiteboard: [ca-compliance] [delayed-revocation-leaf] → [ca-compliance] [delayed-revocation-leaf] Next Update - 5 August, 2020

(In reply to Ryan Sleevi from comment #18)

Certificate profiles will be updated to undergo a multi-party review
On a weekly basis, a Policy team member will review new messages

In one response, it's identified that single-party controls can easily give rise to human error, and so multi-party controls are used. In another, a single-party control is proposed.

I mean, I appreciate that Comment #16 seems to have folded in a number of recommendations, I guess it's hard to tell whether there was any identified gaps or whether they were just taken as-is. My big concern is, going forward, we see incident reports from Entrust take a more systemic, holistic response, like Comment #16, to try and cover the scenarios, and to provide sufficient detail about the situation and its failures to understand how those relate. The goal isn't to make CAs wear proverbial sackcloth, it's to try and make sure we're understanding how things go wrong, so that we can effectively collaborate on identifying solutions to avoid that going forward.

The recommendations from comment 12 were very complete and well suggested. Comment 16 is based on a full review of comment 12, which has been updated to fit with our processes and staffing. I do understand that single-party control has been implemented, but we think that the overall process mitigate the impact of each root cause. We will plan to use this root cause approach for future Incident Reports.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [delayed-revocation-leaf] Next Update - 5 August, 2020 → [ca-compliance] [delayed-revocation-leaf] Next Update - 1-Sept. 2020

Zlint has been updated on pre-issuance linting on 28 August 2020.

I believe this matter can be closed. I will schedule it for closure on or about 21-Sept-2020.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [delayed-revocation-leaf] Next Update - 1-Sept. 2020 → [ca-compliance] [delayed-revocation-leaf] Next Update - 21-Sept. 2020
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] [delayed-revocation-leaf] Next Update - 21-Sept. 2020 → [ca-compliance] [leaf-revocation-delay]
Whiteboard: [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [ov-misissuance]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: