Closed Bug 1535735 Opened 6 months ago Closed 3 months ago

Entrust: Issued Certificates to incorrect Organization

Categories

(NSS :: CA Certificate Compliance, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bruce.morton, Assigned: dathan.demone)

Details

(Whiteboard: [ca-compliance])

User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36

Steps to reproduce:

Entrust Datacard issued OV SSL certificates that included the wrong organization name for one enterprise customer due to a mistake in updating account contact and vetting information.

1.    How your CA first became aware of the problem
 
Entrust Datacard noticed a discrepancy in the customer’s enterprise account vetting information when doing a routine account lookup where it was observed that the wrong organization name was applied to a specific customer certificate profile.
 
2.    A timeline of the actions your CA took in response
 
Dec 14, 2018 – The customer requested to change the contact information for one of their administrators to reflect their sub contractor’s organization name. When this was verified, the information was incorrectly updated to also change the organization name that would be included as the organization value in all future certificates.
March 7, 2019 – The incorrect verified organization name was first noticed by Entrust Datacard while supporting another request from the customer to make account changes.
March 8, 2019 - Entrust Datacard confirmed the error and the information was corrected for that specific customer account to reflect the correct organization name in any certificate issued from that point forward
March 11, 2019 – Investigation started to ensure other accounts were not impacted
March 11, 2019 – Processes and systems reviewed to ensure that this was an isolated incident
March 15, 2019 – Investigations completed and approved by management
 
3.    Confirmation that your CA has stopped issuing TLS/SSL certificates with the problem
 
This problem was specific to a single customer account and occurred due to human error. The mistake was fixed as soon it was confirmed to prevent any further issuance of SSL/TLS certificates with incorrect organization information.
 
4.    A summary of the problematic certificates
 
The wrong organization information was incorrectly updated for a specific enterprise customer account where the customer organization information was updated to their sub-contractor organization who manages their certificates. The incorrect update took place on December 14th, 2018 and the issue was first noticed on March 7, 2019. Over that period of time, 48 certificates were issued by the customer with the incorrect, pre-approved organization name.
 
5.    The complete certificate data for the problematic certificates
 
https://crt.sh/?id=1110810046
https://crt.sh/?id=1146272643
https://crt.sh/?id=1236926559
https://crt.sh/?id=1182386099
https://crt.sh/?id=1257120414
https://crt.sh/?id=1159768744
https://crt.sh/?id=1030170627
https://crt.sh/?id=1164906028
https://crt.sh/?id=1261917793
https://crt.sh/?id=1081394971
https://crt.sh/?id=1077796234
https://crt.sh/?id=1096209685
https://crt.sh/?id=1113811872
https://crt.sh/?id=1039570463
https://crt.sh/?id=1102237127
https://crt.sh/?id=1096445006
https://crt.sh/?id=1102221343
https://crt.sh/?id=1257267818
https://crt.sh/?id=1262639364
https://crt.sh/?id=1233939033
https://crt.sh/?id=1101988407
https://crt.sh/?id=1203790583
https://crt.sh/?id=1081210554
https://crt.sh/?id=1165195650
https://crt.sh/?id=1280445618
https://crt.sh/?id=1214796301
https://crt.sh/?id=1245136029
https://crt.sh/?id=1138555530
https://crt.sh/?id=1226872539
https://crt.sh/?id=1261672969
https://crt.sh/?id=1178850862
https://crt.sh/?id=1240345530
https://crt.sh/?id=1254150760
https://crt.sh/?id=1182470600
https://crt.sh/?id=1262350521
https://crt.sh/?id=1232939233
https://crt.sh/?id=1240473701
https://crt.sh/?id=1164734817
https://crt.sh/?id=1095934155
https://crt.sh/?id=1095916429
https://crt.sh/?id=1262673645
https://crt.sh/?id=1262703724
https://crt.sh/?id=1066746781
https://crt.sh/?id=1259813846
https://crt.sh/?id=1178468427
https://crt.sh/?id=1182214019
https://crt.sh/?id=1144723833
https://crt.sh/?id=1030298370
 
6.    Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
 
The initial mistake was made due to human error. The incorrect organization name for the enterprise account allowed the customer to issue multiple certificates under the incorrect organization name from the time the mistake was made on December 14th, 2018 until the issue was detected. Entrust Datacard detected the issue when doing a routine account lookup while supporting the customer. Upon further review, the system that was used at the time to manage vetting and account information may not have been clear to some users.
 
7.    List of steps your CA is taking to resolve the situation
 
In January 2019, a new version of the vetting and account management portal was released with a new user interface which completely separates organization information that appears in the certificate from account contact information.
 
All miss-issued certificates will be revoked by March 20, 2019.

Assignee: wthayer → bruce.morton
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

To capture the questions from https://groups.google.com/d/msg/mozilla.dev.security.policy/Ts9cKe2j8Uk/b0AC4jqtCwAJ

  1. Why did a week elapse before filing an incident report?
  2. Why were the BRs not observed and these revoked on time?
  3. What's the basis for delaying revocation?
Flags: needinfo?(bruce.morton)

Ryan, my name is Dathan Demone. Bruce is out of the office and I will respond to your questions.

At first, upon our initial discovery, this did not appear to be a clear case of mis-issuance and it required some time to review and determine exactly what happened. A full review of the verification process which included the complete scope and impact of the problem was required before any conclusions could be reached. Immediately upon completion of the investigation, we reported the issue and scheduled revocation of all impacted certificates.

We started working with the customer to revoke the affected certificates on March 8th. The customer is a government IT service agency which must work with multiple state government agencies to replace all the certificates. They requested that we delay revocation to minimize the impact on these agencies. Scheduling revocation for all outstanding certificates for Wednesday, March 20th was the minimal time period that worked for the customer.

(In reply to Dathan Demone from comment #3)

At first, upon our initial discovery, this did not appear to be a clear case of mis-issuance and it required some time to review and determine exactly what happened. A full review of the verification process which included the complete scope and impact of the problem was required before any conclusions could be reached. Immediately upon completion of the investigation, we reported the issue and scheduled revocation of all impacted certificates.

I'm not sure which question is this addressing?

It does not appear consistent with your timeline, which suggested it only took a single day to confirm the issue.

We started working with the customer to revoke the affected certificates on March 8th. The customer is a government IT service agency which must work with multiple state government agencies to replace all the certificates. They requested that we delay revocation to minimize the impact on these agencies. Scheduling revocation for all outstanding certificates for Wednesday, March 20th was the minimal time period that worked for the customer.

While it may be perfectly valid to provide this as the answer, I do want to highlight the expectation that Entrust Datacard take systemic steps to prevent incidents of this nature in the future. As a consequence, were Entrust to have another misissuance, Entrust would need to meaningfully demonstrate how it took steps to ensure that the "minimal time period that worked for the customer" was consistent with the Baseline Requirements. I don't see those steps yet highlighted, but I do want to make sure it's at least clear that by providing this as the reasoning for why delaying revocation, Entrust is committing to not use this rationale in the future by taking steps to address this.

Ryan, in one of your previous comments, you stated that there is a non-zero time to determine if there is a miss-issuance and the 5-day revocation counter starts after that.

https://bugzilla.mozilla.org/show_bug.cgi?id=1520876#c2

In this case, we had to ensure that there was a miss-issuance as we had 2 parties associated with the same account (the government organization and their sub-contractor).

Because this was a manual error, we had to investigate whether other accounts were impacted. It took us just over a business week to determine which accounts and certificates were impacted. Once we finalized this investigation, we set the 5 day counter for revocation.

We may have misinterpreted your post, but we thought the requirement for revocation and reporting miss-issuance was supposed to be implemented allowing for a non-zero-day period of investigation, followed by a 5 day revocation period. Please note that during the investigation there was communication with the subscriber and many certificates were replaced. As such, we believe that we either posted the notice late or are revoking the certificates late, but we were never planning to knowingly break both rules.

Either way, we will work to shorten this timeframe in the future.

(In reply to Dathan Demone from comment #5)

Ryan, in one of your previous comments, you stated that there is a non-zero time to determine if there is a miss-issuance and the 5-day revocation counter starts after that.

https://bugzilla.mozilla.org/show_bug.cgi?id=1520876#c2

In this case, we had to ensure that there was a miss-issuance as we had 2 parties associated with the same account (the government organization and their sub-contractor).

That comment is saying quite the opposite - that the revocation timer does not begin when you notify the customer, as was inappropriately claimed in that issue, but when the issue is determined. I think it's useful to highlight, in that it demonstrates a pattern of Entrust having trouble implementing core requirements in a manner consistent with other CAs and broader expectations.

This understanding of the word "notification" to refer to when the subscriber was notified, as opposed to when Entrust is notified, is based on the immediately preceding answer by Entrust, stating that "The Subscriber of the certificate was notified".

The response, by Bruce, seems to further confirm this shared understanding, as it's based on the CA obtaining evidence - that is, prior to the investigation beginning, prior to the investigation completing, and prior to the notification (if any) to the Subscriber. As one can see throughout Section 4.9.1.1, the action taken is generally the moment the CA is made aware of an event or is notified of an event, which would precede many investigations and focus on when the CA first obtained the evidence; thus ensuring expeditious incident handling.

As it relates to the timelines of incident response, the CA should endeavor to notify programs as soon as possible, as per https://wiki.mozilla.org/CA/Responding_To_An_Incident . With respect to revocation delays, the advice is to notify prior to the expiration of the BR-mandated period (24 hours or 5 days).

we will work to shorten this timeframe in the future.

The nature of the specific questions in Comment #2 are to help understand how this happened, so that it can be clear how the steps being proposed to achieve this shortening would have meaningfully helped.

Thus, it's still useful to understand the specific answers to those questions, just as it is to understand why the delay between March 8 to March 11.

A useful example is to more concretely and comprehensively detail your incident response procedures, handling both internal and external reports. Understanding how Entrust Datacard handles incidents, and how this response aligned or did not align with those procedures, can help understand what steps are being taken in the future.

This is similar to the desire to understand the basis for delaying revocation and how mitigations are being placed for the future, as per Comment #4.

If Entrust Datacard believes its processes reflect best practice, and this was merely an unfortunate exception, then sharing those best practices helps ensure the entire ecosystem improve, while also providing feedback that might highlight unaddressed gaps. Similarly, if Entrust Datacard is concerned there may be a systemic issue at play, providing a deeper analysis about the existing procedures may help identify those flaws, and allow others to share successful best practices being used.

While we confirmed the specific customer issue on March 8th, we wanted to be sure that this was truly a case of human error and that no other customers were impacted before we reported the issue on this forum. Perhaps 7 days is too long to report this issue, but we also wanted to provide as much and as complete information as possible in the initial incident write up. Our wish is to be as transparent as possible while respecting the requirements of the CA/B Forum, root programs and our customers. However, based on your previous comment, we admit a misunderstanding of the guideline and we should have reported this case as soon as we confirmed that something was wrong on March 8th. The rest of our investigation could have continued in parallel.

Our procedure to handle these types of incidents is to first perform a high level analysis of the issue involving multiple parties (Audit Specialists, Management, Compliance, and Security), confirm the specific nature of the issue, and then to answer questions such as how the problem was allowed to occur in the first place, if the issue was widespread or isolated, and how it can be fixed as quickly as possible. In parallel, we work with our customers as soon as we confirm that there might be an issue so that they can have as much time as possible to start preparing for certificate revocation and possible replacement. Our current process is to issue our public reports for these incidents as soon as we have completed our investigation.

Based off what we have learned with this incident, we will work to change our process such that we report the incident publicly as soon as it has been confirmed. In this particular case, had we been following a procedure where we report the incident as soon as it was confirmed, we would have reported the incident publicly on March 8th.

Dathan: Please confirm that all certificates were revoked by 20-March, and please update this bug describing the specific process changes mentioned in comment #7 and confirming that they have been implemented.

Assignee: bruce.morton → dathan.demone
Flags: needinfo?(bruce.morton) → needinfo?(dathan.demone)

All of the certificates listed in the original report were successfully revoked on or before March 20th.

As a follow up to comment #7, our process for handling these types of events moving forward is to report any incidents on this forum as soon as they are confirmed on our side, to provide as much information as possible at the time of reporting the incident, to continue our investigation in parallel to see if any other certificates are impacted, and to report any new findings as they are discovered. In terms of certificate revocation, we will plan to revoke the impacted certificates within 5 days of posting our initial report unless we have good reason to wait for some additional period of time.

Flags: needinfo?(dathan.demone)

(In reply to Dathan Demone from comment #9)

In terms of certificate revocation, we will plan to revoke the impacted certificates within 5 days of posting our initial report unless we have good reason to wait for some additional period of time.

I again feel it necessary to highlight that this it not compliant with the Baseline Requirements' expectations.

Sections 4.9.1.1 and 4.9.5 place very clear limits and expectations on revocation, and these are not compatible with the above description.

To avoid ambiguity and future issues with delays in processing that result from Entrust's confusion around these requirements, it would be be best to fully describe your operating procedures for handling certificate problem reports. The details in the CP/CPS (as required by 4.9.3) are not sufficient, in that they only detail reporting, not how Entrust will internally handle, escalate, review, and ensure compliance with, the revocation timelines.

Flags: needinfo?(dathan.demone)

The CPS section 4.9.3 will be updated to address procedures for Certificate Problem Reports.

Our objective is to complete this by May 31st, 2019.

Flags: needinfo?(dathan.demone)

Dathan:

I'm not sure if there's confusion or something getting missed in translation, but I do want to make sure to resolve it before closing this issue, as it will potentially reflect poorly on Entrust on this issue if we don't have resolution, and certainly be far more significant in future issues. Given Comment #5, it's clear that Entrust has not understood what is required of them in the past, and thus it would be rather bad to continue that.

Could you please describe, on this bug and incident response, your operating procedures for handling incident response. From the moment a Certificate Problem Report is received, please describe how Entrust will internally handle, escalate, and review such incidents, as well as clearly indicate the timelines for which Entrust will take action?

This will help demonstrate if there is any further confusion, so that it may be prevented before another incident results due to Entrust's misunderstanding of the Baseline Requirements.

Flags: needinfo?(dathan.demone)

Ryan,

Here is our operating procedure for handling Certificate Problem Reports including our updated timelines:

  • Entrust Datacard receives a potential Certificate Problem Report (CPR). This CPR can originate from both internal (verification, compliance, support, security, R&D, operations, certificate checkers) and external sources (customers, affected parties, or other third parties).
  • The CPR is immediately be escalated to our support team, who will log the CPR as high severity into a ticketing system for tracking purposes.
  • The support team will review the report and engage the necessary parties to determine if the CPR is true. If they determine the report to be true, a draft report must be created within 24 hours and provided to the Subscriber and the party that provided the CPR. If it is determined that the CPR is not true, the case is closed.
  • If the CPR is determined to be true, we will then determine if there is a real case of certificate mis-issuance. If we determine that the certificate was issued and verified according to our accepted procedures, we will close the issue.
  • If we determine that this is a case of mis-issuance, the incident is escalated to the Policy Authority team
  • The Policy Authority team will draft a mis-issuance report and post it on the Mozilla system within 1 business day. Note: The mis-issuance report may be incomplete if the investigations are ongoing, however, immediately reporting the incident with as much information as possible will provide transparency to the community and may even result in useful feedback. We will continue to add information to the Mozilla bug if/as it becomes available.
  • A discussion around certificate revocation will be addressed in mis-issue report and we will plan to schedule revocation for any mis-issued certificate to occur within 5 days of confirming that this was a case of mis-issuance. If revocation is requested by the subscriber, the certificate(s) will be revoked within 24 hours.
  • The mis-issuance report will be updated within 5 days from receipt of CPR
  • Note that all miss-issued certificates will be escalated to management for review
Flags: needinfo?(dathan.demone)

Thanks Dathan - this is exactly what I was hoping to understand :)

From your description, it sounds like there is a potential non-compliance issue, regarding:

  • The support team will review the report and engage the necessary parties to determine if the CPR is true. If they determine the report to be true, a draft report must be created within 24 hours and provided to the Subscriber and the party that provided the CPR. If it is determined that the CPR is not true, the case is closed.

Compared with the Baseline Requirements, v1.6.4, Section 4.9.5, the following requirement exists:

Within 24 hours after receiving a Certificate Problem Report, the CA SHALL investigate the facts and
circumstances related to a Certificate Problem Report and provide a preliminary report on its findings to both the
Subscriber and the entity who filed the Certificate Problem Report.

Thus, if the support team determines that the CPR is not true, there is still an obligation to provide a preliminary report to the entity that filed the CPR. It sounds as if the procedures will need to be updated accordingly. Perhaps that was implied by "the case is closed", but given the need to provide that preliminary report, it may be worth examining the processes to ensure that's met.

Hi Ryan, we agree with your comment.

Let's replace this text:
"The support team will review the report and engage the necessary parties to determine if the CPR is true. If they determine the report to be true, a draft report must be created within 24 hours and provided to the Subscriber and the party that provided the CPR. If it is determined that the CPR is not true, the case is closed."

with this text:
"The support team will review the report and engage the necessary parties to determine if the CPR is true. A draft report must be created within 24 hours and provided to both the Subscriber and the party that provided the CPR. A final report will also be provided to both parties when the case is closed."

Thanks, Bruce.

It appears that remediation is complete.

Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Summary: Entrust - Issued Certificates to incorrect Organization → Entrust: Issued Certificates to incorrect Organization
You need to log in before you can comment on or make changes to this bug.