Closed Bug 1825780 Opened 1 year ago Closed 10 months ago

Telekom Security: Improper use of a domain validation method

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Arnold.Essing, Assigned: Arnold.Essing)

Details

(Whiteboard: [ca-compliance] Next update 2023-04-21)

Attachments

(1 file)

During our annual ETSI Audit, it was determined that a domain validation method implemented for internal customers only could also be used by external costumers.
We are still analyzing and plan to provide a more detailed incident report no later than April 6, 2023.

Assignee: nobody → Arnold.Essing
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

After further analysis, it appears that a misunderstanding occurred during the audit due to a misleading documentation, which could not be clarified by the persons involved in the audit situation.
Due to the fact, that our auditors are on easter vacations, we expect to clarify the circumstances after the return of the auditors in week 16 (starting April 17th). Accordingly, we will also provide an update to this bug within that week at the latest.

Nonetheless, due to the lack of clarity, the following immediate measures had been taken:
-stopping the issuance of certificates via the doubted method
-preventive revocation of the allegedly affected certificates
-filing the preliminary bug in bugzilla

Whiteboard: [ca-compliance] → [ca-compliance] Next update 2023-04-21

As indicated in comment#1, the initial assumption from comment#0, that a validation method (BR#3.2.2.4.12) only permitted for internal customers was also used for external customers, has not been confirmed.
After further internal clarification with the colleagues responsible for the affected solution and after consultation with the auditors, the actual bug is that the method BR#3.2.2.4.2, on which the "DENIC process" (see 6. below) is based on, is not implemented in accordance with the BR.
Due to a misleading description in the RA Frontend, the underlying method was not spontaneously apparent to those involved in the audit, so that the specific circumstances could not be clarified during the audit.
Remark: the "DENIC process" under consideration was specifically designed to use the functionalities of the “webwhois” process provided by DENIC, the central registry for ".de" domains, and thus has only been used for validating ".de" domains for OV and EV certificates in our solution named “Server.ID” (and only if the applicant has chosen this method for validation).

1) How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date.
During our annual ETSI Audit, our auditor noticed the non-conformity

2) A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was performed.
2023-03-29 During our annual ETSI Audit, our auditor noticed the non-conformity
2023-03-29 14:00 First information to the management and stopping the issuance of certificates under the affected method
2023-03-30 13:22 Certificates issued with that method have been identified.
2023-03-30 14:11 Service Portal (Customer Frontend) has been modified to clarify that the validation method is no longer available
2023-03-30 15:30 Affected Customers were informed of the need for revocation within 24 hours and were asked to replace the certificates.
2023-03-31 10:30 Further call including management, compliance, personnel responsible for the solution, and RA. It was decided to extend the revocation deadline to a maximum of five days (see 7. below). In order to promote the timely replacement of the certificates, it was further decided that the registration staff will work on Saturday (2023-04-01) as well.
2023-03-31 15:29 Preliminary Incident Report has been published in Bugzilla.
2023-04-03 13:30 Further meeting including personnel of the solution management that has not been available until then. It was determined that a misunderstanding occurred (see introduction) and that the process actually is supposed to correspond to method 3.2.2.4.2. In addition, it was determined that the auditors should be consulted for clarification of the circumstances.
2023-04-04 11:39 Last affected certificate has been revoked.
2023-04-18 16:00 Final consultation with the auditor. The misunderstanding has been clarified (see introduction), but of course it still remains a non-conformity due to the improper implementation of 3.2.2.4.2.

3) Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.
Yes, we have stopped issuing certificates using the validation method in question.

4) In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.
126 active certificates from 25 external customers of our solution “Server.ID” were affected, see the attached list.

5) In a case involving TLS server certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. It is also recommended that you use this form in your list https://crt.sh/?sha256=[sha256-hash], unless circumstances dictate otherwise. When the incident being reported involves an SMIME certificate, if disclosure of personally identifiable information in the certificate may be contrary to applicable law, please provide at least the certificate serial number and SHA256 hash of the certificate. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.
See attached file “certificates.csv”

6) Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
The "DENIC process" for validating ".de" domains is implemented as follows:
If a customer selects the method under consideration for validating a ".de" domain, the CA automatically sends an email to the registered contact person with the instruction to request a link to view the data registered with DENIC via the webwhois functionality of DENIC.
After validation of the request, DENIC automatically sends an email to the domain contact registered for that domain containing a link valid for 48 hours for viewing the registered domain info. This link contains a random value of 42 characters (uppercase, lowercase, digits).
The customer forwards this temporary link to the CA (via upload in the customer frontend), whereupon the RA is informed and can retrieve the domain information within 48 hours directly from DENIC via the temporary link. The RA validates the retrieved domain information against the data provided by the identified applicant, i.e., the domain itself as well as the domain owner and the domain contact.
Thus, with regard to BR#3.2.2.4.2, the implementation covers the requirement that the domain contact is contacted via an email containing a random value with sufficient entropy and the CA uses this random value for validation. For this reason, method BR#3.2.2.4.2 was considered to be fulfilled when implementing the DENIC process.
However, when implementing the process, the definition of “Random Value” according to BR#1.6 was not fully taken into account, i.e., the requirement that a random value shall be generated by the CA was not considered.

7) List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future. The steps should include the action(s) for resolving the issue, the status of each action, and the date each action will be completed.
As an immediate measure, issuing certificates based on this validation method was stopped and the registration staff has been briefed.
In the next step, it was determined, which active certificates and, thus, which customers were affected. The affected customers were informed about the upcoming revocation of the certificates within the next 24 hours and were asked to quickly request new certificates for replacement using another domain validation method.
After these immediate measures, a risk analysis was performed and it was determined that the process as a whole still ensured that no certificates with unauthorized domain names were issued.
Nevertheless, according to the state of knowledge at the time, there was probably a violation of the BR. For this reason and also due to the criticality of some affected websites, it was decided to extent the revocation deadline to a maximum of five days (according to BR#4.9.1.1 item 12).
As a further mid-term measure, the developers were instructed to adapt the description in the RA Frontend in order to avoid misunderstandings and confusion in the future.
Last but not least, it was decided not to correct the implemented method anymore, but to permanently discontinue the use of method BR#3.2.2.4.2, which is also in line with future plans: we are currently building a new platform which will replace our current service platforms and, according to the current status, will only include methods BR#3.2.2.4.4, BR#3.2.2.4.7, BR#3.2.2.4.12, BR#3.2.2.4.18 and BR#3.2.2.4.19.

Attached file certificates.csv

The timeline in Comment 2 is incomplete. When did the non-conformance begin?

The incident report doesn't explain the root cause of the incident. What processes does Telekom Security use to ensure their DCV implementations are compliant? Why did those processes not identify that this method was non-compliant? How will they be improved to prevent this from happening again? Is Telekom Security conducting a review of their other DCV methods to ensure they are compliant?

However, when implementing the process, the definition of "Random Value" according to BR#1.6 was not fully taken into account, i.e., the requirement that a random value shall be generated by the CA was not considered.

Except for the requirements which BR 1.3.2 explicitly allows to be delegated, the requirements specified in the BRs must be performed by the CA. It's rather concerning that you've identified only the definition of "Random Value" as prohibiting delegation. Are there any other requirements which Telekom Security has delegated to third parties?

After these immediate measures, a risk analysis was performed and it was determined that the process as a whole still ensured that no certificates with unauthorized domain names were issued. Nevertheless, according to the state of knowledge at the time, there was probably a violation of the BR. For this reason and also due to the criticality of some affected websites, it was decided to extent the revocation deadline to a maximum of five days (according to BR#4.9.1.1 item 12).

BR 4.9.1.1(5) requires revocation within 24 hours for domain validation failures. https://wiki.mozilla.org/CA/Responding_To_An_Incident states that 'Responses similar to "we do not deem this non-compliant certificate to be a security risk" are not acceptable' for delaying the revocation of a certificate. Therefore, a separate incident report needs to be created for this revocation delay.

Flags: needinfo?(Arnold.Essing)

Timeline Supplementation
The Method was implemented on Nov 22, 2019.

Root Cause
When implementing the method, the colleagues responsible for the affected Trust Service were convinced, that the implementation was correct. At that time there was no further control authority except the internal auditor, who audited the Trust Services regularly but only on a random basis.

Quality improvement and implementation review
In 2020 the "Root Team", i.e., the team responsible for managing the public Root CAs as well as the requirements of the CA/Browser Forum and the Root Store Programs, was staffed with additional resources. The aim was, beside the improvement of the quality, to separate more strictly the requirement and compliance management from implementing.
Since then, every change and new development in the implementation is approved by the Root Team. On the other hand, every change in the BRs, EVGL or Root Store Policies is discussed between the Root Team and the responsible persons of the affected Trust Services implementing the changes. Furthermore, the Root Team discusses changes of the BRs etc. with other TSPs to get an independent opinion, if there are uncertainties.
Parallel to the above mentioned process of approving the changes and new developments, the existing implementations have also been inspected. However, regarding the implementation of the DV methods, it was checked that only allowed methods of the BR were used. The implementation of the DV methods was not checked in detail.
Within the context of this bug, all DV methods have now also been checked in detail and assessed as having been implemented correctly.

Generating the Random Value by Denic
Even if the generation of the Random Value as one single step of the process to validate the domains was done by Denic, the process as a whole was done by Telekom Security itself, so we see no delegation of the domain validation at this point, and also otherwise no domain validation is delegated to third parties.
Remark: The quality of the Random Values generated by Denic, i.e., the CSPRNG as well as the entropy were ensured.

Revocation according to BR#4.9.1.1, item 12
As stated above, we determined that the validation process as a whole was sufficiently secure to ensure that the domain validation could still be relied upon. In addition, it should be mentioned that these are OV and EV certificates from validated subscribers, which are subject to a number of further validations and not just DV certificates from unknown applicants.
For this, we do not see the revocation reason to be according to BR#4.9.1.1 item 5. Accordingly, we do not see a revocation delay, as we have revoked the affected certificates as soon as possible but within 5 days at the latest according to BR#4.9.1.1 item 12.
Remark: the referenced requirement from mozilla https://wiki.mozilla.org/CA/Responding_To_An_Incident also states in the second paragraph under "Revocation": "This means that, in most cases of misissuance, the CA has an obligation under the BRs to revoke the certificates concerned within 24 hours, or 5 days in some cases.”

Flags: needinfo?(Arnold.Essing)

We are monitoring this bug for feedback. Please let us know if there are any further comments or questions.

We are monitoring this bug for feedback. Please let us know if there are any further comments or questions.

We are monitoring this bug for feedback. Please let us know if there are any further comments or questions.
If there are no further comments or questions, we would like to propose closure of this bug.

I will close this bug on or about Wed. 24-May-2023.

Flags: needinfo?(bwilson)

Even if the generation of the Random Value as one single step of the process to validate the domains was done by Denic, the process as a whole was done by Telekom Security itself, so we see no delegation of the domain validation at this point, and also otherwise no domain validation is delegated to third parties.
Remark: The quality of the Random Values generated by Denic, i.e., the CSPRNG as well as the entropy were ensured.

How do you know that Denic generated Random Values with sufficient entropy from a CSPRNG? How do relying parties know this? Denic does not appear to be covered by Telekom Security's audit.

Flags: needinfo?(Arnold.Essing)

I'll await Telekom's response before considering closing this bug.

The quality of the random values has been checked via direct communication with Denic, i.e. the utilized bibliographies and parameters have been evaluated to fulfill the requirements. However, this generation of random values by Denic was, indeed, not covered by Telekom Security’s audits.

Flags: needinfo?(Arnold.Essing)

Hi Andrew,
Does this address your concerns?
Thanks,
Ben

Flags: needinfo?(agwa-bugs)

I'm still concerned that despite a security-critical part of domain validation (generating the Random Value) being performed by an unaudited third party, Telekom Security concluded that the validations could be relied up. Telekom Security (plus relying parties) have to take it on blind faith that the Random Values were securely generated. CAs should not be relying on the word of unaudited third parties.

More generally, I believe that ever since "any other method" domain validation was banned, CAs no longer have discretion to determine what domain validation procedures can be relied upon. They are only allowed to rely upon the enumerated methods. If evidence is found, as it was in this case, that a different procedure was used, that is evidence that the domain validation can't be relied upon, triggering a 24 hour revocation requirement, as it should given the security implications of domain validation.

Unfortunately, this is not the first time that a CA has taken this stance. Similar concerns were raised in Bug 1751984 Comment 4. At least in that case, the CA's discretion was backed by a reasonable security analysis; in this case, not so much. I think it would be beneficial for root programs to clarify the expectations for revocation timeframe when there's a violation of domain validation requirements, or better yet, change the BRs to be explicitly require 24 hours.

Flags: needinfo?(agwa-bugs)

We are of the opinion that all the facts have been disclosed and we have nothing more to add. Therefore, we would like to propose closure of this bug.

I intend to close this on Wed. 7-June-2023. For comment #14 that "root programs clarify the expectations for revocation timeframe when there's a violation of domain validation requirements, or better yet, change the BRs to be explicitly require 24 hours", this is a good suggestion and needs to be considered in future versions of root store policies or the BRs.

A few thoughts:

  • I tend to concur with Andrew's analysis. Unless I've misunderstood, the system used to generate Random Values used to validate domains (and thereby authorize issuance of certificates with those domain names included) was not audited per BR and Root Program requirements. If they were not audited, they cannot be relied upon to perform functions under the scope of the audit. This seems to me a very clear and direct violation of BR 3.2.2.4.2 (and thus, in response to the incident, 4.9.1.1.(5)).
    • Further, it seems the entire process of BR 3.2.2.4.2 was delegated to DENIC. Not only was the Random Value generated by DENIC, it appears the validation of said Random Value was also performed by DENIC and, effectively, the CA was only reviewing domain registration information. It's not clear how this could be considered compliant with the BRs.
    • The description given does not inspire confidence that the domain validations could be relied upon, certainly not from the perspective of an external observer with no evidence to rely upon.
    • While I don't reach the same conclusions, based on the data supplied, that the domain validations performed were reliable, I do see the outcome of deprecating use of BR 3.2.2.4.2 to be a positive one.
  • It's encouraging to see this non-comformity was initially identified as part of the annual audit. It's deeply worrisome that the behavior was in place for such a prolonged period of time, as it appears to have been.
    • It appears that auditors were relying on a TSP-provided description of a CA system, rather than the actual behavior and function of the system, to determine compliance.
  • I had thought that the wording in BR 4.9.1.1.(5) was sufficiently clear to explicitly require revocation within 24 hours in cases where domain validation requirements have been violated.
    • If the components of performing a domain validation are not all present, that seems to me to qualify as a violation.
    • Unlike with internally defined controls, it is not my understanding that a CA can have compensating controls which obviate the need to meet the baseline requirements for performing domain validation.
  • In https://bugzilla.mozilla.org/show_bug.cgi?id=1825780#c5, it's mentioned that part of the assessment centered on the affected certificates being OV/EV certificates. It's not clear to me what validations, specific to OV/EV certificates, replace the need to validate domains. Specifically, a DV certificate with a properly validated domain is incomparably more reliable for the purpose of server authentication than an OV/EV certificate with improperly validated domains.

Thank you for the feedback. We agree in principle, i.e. the validation was not compliant with the BR, therefore we have revoked the affected certificates and permanently stopped the incorrectly implemented method.
However, we would still like to outline our rationale at that time for choosing to revoke within 5 days according to 4.9.1.1 (12):
We determined that the validation of the de-domains could be trusted because the validation was carried out in a secure process directly with DENIC, i.e. the central registry for all de-domains. DENIC established the process in 2018 according to the state of the art when implementing the GDPR and confirmed to us the use of a CSPRNG with sufficient entropy. Furthermore, as an operator of a critical infrastructure, DENIC is regularly audited according to ISO 27001 and 22301 in accordance with the German "BSI law" and the resulting KRITIS regulation.
Accordingly, our assessment was that DENIC is to be classified as trustworthy and that the process established by DENIC offers sufficient security to trust the control over the domains and exclude misissued certificates.
However, it is understandable for us that one may disagree with this assessment, especially due to the lack of integration in our auditing.
We did not interpret a deviation from the domain validation methods to automatically result in the validation to not be reliable at all. We therefore welcome the suggestion of Andrew and Ben (see comment#14 and #16) to clarify the revocation within 24 hours again in the BR and/or the Root Store Policies.
Regarding the OV/EV validation we would like to clarify that we did not want to replace the domain validation with an OV/EV validation, but that the organization data provided by DENIC in the domain validation was additionally compared with the organization data verified in the OV/EV validation.

We are monitoring this bug for feedback. Please let us know if there are any further comments or questions.

I am not sure that we'll get any further by keeping this matter open, so I intend to close it on Wed. 28-June-2023.

Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: