Closed Bug 1567659 Opened 5 years ago Closed 5 years ago

Entrust: SHA-1 Issuance and other misissuance while testing

Categories

(CA Program :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wthayer, Assigned: bruce.morton)

Details

(Whiteboard: [ca-compliance] [ev-misissuance])

In bug 1561013 comment #10, Entrust reported the following misissuances while testing a procedure to remediate that incident, and then described the new misissuance as follows:

In finalizing the Phase 1 procedure, 4 certificates were miss-issued due to manual error.

The following certificate was issued with the incorrect profile, the result was the certificate was signed using SHA-1 and has no subjectAltName.
https://crt.sh/?id=1681033012

The following 3 certificates were issued to the correct profile, but there was a spelling error in the url for the OCSP response.
https://crt.sh/?id=1642629915
https://crt.sh/?id=1642630141
https://crt.sh/?id=1642630140

All 4 certificates have been revoked or have expired.

Please provide an incident report as described at https://wiki.mozilla.org/CA/Responding_To_An_Incident#Incident_Report

(In reply to Wayne Thayer )

Please provide an incident report as described at https://wiki.mozilla.org/CA/Responding_To_An_Incident#Incident_Report

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

Entrust Datacard discovered the miss-issue through testing when renewing certificates for the root embedding test sites. The multi-problems were discovered on July 4 and July 8, 2019.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

July 4, 2019, 13:55 UTC - Manually issued an EV SSL certificate using the wrong profile; as such, the certificate was signed using SHA-1, had no SAN and the CDP was missing the HTTP URL. This certificate expired on July 6, 2019, 2:22 UTC.
July 4, 2019, 14:15 UTC - Started to manually issuing 3 certificate using the correct profile. Unfortunately, this profile had a spelling error in the URL for the OCSP responses. Note that one certificate was revoked during this process as it was issued for a test site showing a revoked certificate.
July 8, 2019, 10:05 UTC - Discovered the spelling error in the certificates.
July 8, 2019, 20:51 UTC - Revoked one certificate. The others did not need to be addressed as they had either already been revoked or had expired.

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

The CA has stopped issuing certificates using the incorrect policy and with spelling errors.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

One certificate signed with SHA-1 was issued on 4 July 2019.
Three certificates with the incorrect OCSP URL were issued on 4 July 2019.

  1. The complete certificate data for the problematic certificates.

The following certificate was issued with the incorrect policy and was signed using SHA-1.
https://crt.sh/?id=1681033012

The following 3 certificates were issued to the correct profile, but there was a spelling error in the url for the OCSP response.
https://crt.sh/?id=1642629915
https://crt.sh/?id=1642630141
https://crt.sh/?id=1642630140

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The issuing CA is not set up for production as there are no third party certificates currently being issued. There is no automated issuance nor is there any pre-issuance linting set for a non-production CA. The certificates to support the BR required test sites were issued manually.

The problems occurred due to two reasons:

  1. The incorrect CA policy was chosen as a result the certificate was issued signed with SHA-1, no subjectAltName and no HTTP URL for the CRL.
  2. A new certificate profile was created. This was not necessary as the CA already had a legitimate EV SSL certificate profile implemented. The new profile was tested in QA and a test certificate was issued. Multiple reviewers did not detect the typo in the OCSP URL.
  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

First, we will be limiting the number of certificates issued manually and move to a more automated issuance solution as noted in Bug 1561013. This solution will also be supported by our policy engine and pre-issuance linting.

In the interim, the CAs which are used to manually issue certificates have the correct default policy implemented and have an EV certificate profile which already meets the industry requirements. New certificates will be reissued as a recovery of existing certificates which use the current approved policy and profile.

In some cases the certificate cannot be issued by recovery. This is the case when the company data does not pass verification and must be updated. In this case, the manual issuance process will include the subject DN, SANs, certificate profile and validity period. The subject DN will be updated and the certificate will be issued using the same certificate policy as the certificate which it is replacing. The goal is the manual process will not change the configuration of the CA and the previous approved and usable configuration will be used.

I'm not really sure how confident I am that this problem has been properly understood and mitigated.

Could you describe what controls you have around manual issuance today? Concretely, help walk me through the process and timeline from "Entrust decided to issue these certificates" until "July 4, 2019, 13:55 UTC". The omission of the events before this point are critical to understanding the true root causes, because I do not believe that they've been sufficiently identified or mitigated by Entrust's proposed changes.

Flags: needinfo?(bruce.morton)

(In reply to Ryan Sleevi from comment #2)

I'm not really sure how confident I am that this problem has been properly understood and mitigated.

Could you describe what controls you have around manual issuance today? Concretely, help walk me through the process and timeline from "Entrust decided to issue these certificates" until "July 4, 2019, 13:55 UTC". The omission of the events before this point are critical to understanding the true root causes, because I do not believe that they've been sufficiently identified or mitigated by Entrust's proposed changes.

We issue certificates on a bi-annual basis to meet the requirements in BR section 2.2. Some certificates are issued through our normal process which is the same as a customer would get a certificate. Some are issued using a manual process for root CAs which do not have an EV CA in production which has been configured to issue for customers. In this case the manual process was required. The user which issued the certificate made an incorrect profile selection and also made a typo-error in the certificate profile. These are both manual errors which can be avoided as the CAs do have the correct policies and profiles implemented. In most cases, this error can be mitigated by either using functionality which is built into the CA, that is, issue using the recovery mode. This error can also be mitigated if the CAs are configured to issue in the same process as we use for customers where the policy and profile is fixed for the given certificate type. This is our final step to remove manual issuance.

Flags: needinfo?(bruce.morton)

Thanks Bruce.

I want to highlight, because it sounds like your process lacks many of the common controls seen at other organizations:

  • For manual issuance, Entrust does not require multi-party control or review
  • For routine tasks, Entrust does not maintain documentation or playbooks to reduce the risk of error
  • Entrust has not completed an in-depth analysis as to what factors may have contributed to the incorrect profile selection and the typo (e.g. training, multi-party controls)
  • Entrusts' systems are still configured in a way to permit the selection of insecure combinations, rather than removed/retired

Thus, it seems reasonable to conclude that other routine tasks, which may require manual action, are just as likely to be error prone and result in risk for the community. It also seems reasonable to conclude that Entrust's security configuration can be easily circumvented by its staff, and adequate compensating controls may not exist.

This is why I wanted to provide an opportunity to provide a more detailed explanation, because helping understand the process of what is involved for this routine task could have helped Entrust demonstrate to the community what its process and procedures are, as other CAs have done in their incident reports. While I'm concerned that the opportunity wasn't taken, I did want to provide another opportunity to help describe exactly what happens (historically; that is, without your new changes) for such issuances, including who is involved, how they are notified, etc - in the event there are other controls that were omitted. Similarly, providing an as-detailed description of the new, post-incident world helps demonstrate how the risks have been addressed, as well as helps highlight opportunities for improvement that may have been inadvertently overlooked.

Flags: needinfo?(bruce.morton)

Thank you for the opportunity to provide more information.

I would like to confirm that Entrust does require multi-party control and review and maintain documentation to reduce risk of error. Please note that a manual issuance is not a routine task as it rarely occurs. We are moving to retire both manual issuance and insecure combinations as we migrate away from our old PKI software.

In this case, we follow what we call a "hand-roll" process. This is a process to provide information and approve the manual issuance of a certificate. The process is as follows:

1. Requester will want a certificate(s) hand-rolled.
2. Requester will determine the certificate type, validity, subject name, SANs and certificate profile.
3. Requester will ask Operations for the CSR(s).
4. Operations will provide CSRs.
5. Requester will review the CSR using a CSR policy checker, check key size and if weak keys were used.
6. Requester will generate hand-roll request including the information from step 2 and the CSR.
7. Requester will send hand-roll request and certificate profile to Support and copy Verification, Operations and PA Chair.
8. Verification will review and approve the subject name and check CAA for the SANs.
9. Operations will generate the certificate(s).
10. Operations will review the certificate with the certificate profile.
11. Operations will check the certificate using cert lint, such as https://crt.sh/lintcert.
12. PA Chair or Operations Infrastructure Architect will review and approve the certificate for use.

In this case the failures happened with step 9. In order to issue the certificate, we need to have a certificate profile and a certificate specification. The certificate profile provides data to the CA for the certificate request. The certificate specification is built in as part of the CA for the different certificate types. The certificate profile and certificate specification were built in QA and a test certificate was issued and reviewed.

Two trusted roles went to the CA to issue the certificates. One role is to perform the task, the second role is to audit the task and review the outcome. On the CA, there is a number of out-of-the-box certificate profiles, plus custom profiles can be added. The new EV custom certificate profile and the new certificate specification were added.

The first error occurred because during the certificate issuance another certificate profile with a similar name was chosen. The result was an EV certificate was issued signed using SHA-1, had no SAN and the CDP was missing the HTTP URL. This error was detected immediately through review. The process was performed again using the correct profile.

The second error occurred as now the new certificate specification was being requested. In this case, there was a typo in the certificate specification where "ocsp.entrust.net" was typed as "ocps.entrust.net". This error could also be seen in the certificate which was from the QA CA, but was missed under review. The production certificate was reviewed and also tested through certlint and the error was not detected. The error was detected when one of the certificates was being installed on the server.

We are not satisfied with this event and as a result are moving to reduce or eliminate manual issuance. The good news is that we are moving to new PKI software, where there is no user interface. All certificate issuances will be performed through an API. The implementation of use will mean that certificate requests will be done through an account with pre-validated data. The certificate request move through the system policy engine, will have CAA automatically performed, have pre-issuance linting performed and CT logging executed. We are confident that this will mitigate this type of miss-issuance.

Flags: needinfo?(bruce.morton)

Thanks! This improved description goes a long way to addressing some of the concerns, and helps build up a better picture.

In terms of understanding where things went less than ideal, two things stand out for further discussion:

  • The existing CA had profiles still configured which were not valid for public issuance
  • A new profile was created, but in the process of generating that new profile, errors were introduced

It's not clear to me that the transition to API-based will reduce the risk of legacy profiles. We've seen other CAs has issue with API-based flows or legacy profile configurations (e.g. Bug 1552586, Bug 1550645, Bug 1556948, Bug 1485851). To what extent has Entrust audited its existing profile configurations, to ensure that the only usable profiles are ones currently conforming? I'm just trying to understand whether there's any possibility for any other Entrust employee to select the (incorrect) profile, or if it's limited in terms of technical and access controls. If it is limited today, what are those controls, and what steps are being taken to ensure they function correctly until the transition to the new platform?

Understanding that typos happen, and especially the human brain is poorly suited for such detecting such issues, how will the transition to the API flow improve this? My assumption is that you'll still be working on a list of profiles configured on the backend, and new profiles create new risks. What sort of changes have been added to the playbook for defining or generating new profiles? For example, if your QA process involved testing the OCSP response and the CRLs were properly generated, presumably it would have flagged the ocps.entrust.net issue as an unresolvable domain / invalid CDP/AIA, and raised an alert sooner than when it was detected. For example, Bug 1522975 was an example that had a similar issue.

I'm looking at this in terms of trying to understand the patterns that the industry has found and responded to, as both of the issues highlighted above seem like things that the Incident Reporting process identified as opportunities to improve. In addition to responding to those specific suggestions, it might be useful to understand what steps Entrust has to monitor such incidents (from other CAs), to identify and possibly mitigate similar risks within Entrust's own process and systems, or to even improve and standardize such approaches.

Thanks!

Flags: needinfo?(bruce.morton)

Hi Ryan: Thanks for the comments and questions. I have our development team working on a response. I will be on leave until 12 August 2019. I hope to have an update that week.

Flags: needinfo?(bruce.morton)

(In reply to Ryan Sleevi from comment #6)

It's not clear to me that the transition to API-based will reduce the risk of legacy profiles. We've seen other CAs has issue with API-based flows or legacy profile configurations (e.g. Bug 1552586, Bug 1550645, Bug 1556948, Bug 1485851). To what extent has Entrust audited its existing profile configurations, to ensure that the only usable profiles are ones currently conforming? I'm just trying to understand whether there's any possibility for any other Entrust employee to select the (incorrect) profile, or if it's limited in terms of technical and access controls. If it is limited today, what are those controls, and what steps are being taken to ensure they function correctly until the transition to the new platform?

There is minimal possibility for an employee to select the incorrect profile. The reason is manual issue almost never happens. If manual issuance does happen, it has to be approved and is performed under dual custody as described in comment #5.

Understanding that typos happen, and especially the human brain is poorly suited for such detecting such issues, how will the transition to the API flow improve this? My assumption is that you'll still be working on a list of profiles configured on the backend, and new profiles create new risks. What sort of changes have been added to the playbook for defining or generating new profiles? For example, if your QA process involved testing the OCSP response and the CRLs were properly generated, presumably it would have flagged the ocps.entrust.net issue as an unresolvable domain / invalid CDP/AIA, and raised an alert sooner than when it was detected. For example, Bug 1522975 was an example that had a similar issue.

The certificate profiles are developed and reviewed. After approval, the certificate profiles are provided to our Development team for implementation. In QA testing, sample certificates are issued. These certificates are reviewed by QA, the Design Architect, and the Policy Authority Chair. The certificates are also tested through linting where the specific URLs are added to the linting tool. Also note that as we move to the API method, all proven certificate profiles will be migrated.

I'm looking at this in terms of trying to understand the patterns that the industry has found and responded to, as both of the issues highlighted above seem like things that the Incident Reporting process identified as opportunities to improve. In addition to responding to those specific suggestions, it might be useful to understand what steps Entrust has to monitor such incidents (from other CAs), to identify and possibly mitigate similar risks within Entrust's own process and systems, or to even improve and standardize such approaches.

Entrust does monitor Mozilla discussions, which may include incident reporting. This monitoring may flag action to improve our deployment of certificate management systems. We will plan to increase our level of monitoring and action.

It appears that all questions have been answered and remediation is complete.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ev-misissuance]
You need to log in before you can comment on or make changes to this bug.