Closed Bug 1651611 Opened 4 years ago Closed 3 years ago

Telekom Security: Finding in 2020 ETSI-Audit regarding weekly review of changes to configurations

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Arnold.Essing, Assigned: Arnold.Essing)

Details

(Whiteboard: [ca-compliance] [audit-finding])

1.How your CA first became aware of the problem, and the time and date.
During the audit, carried out from 2020-03-23 till 2020-04-01, it was identified that unauthorized changes to the configuration of one system were not covered by a weekly review due to the default monitoring solution not being applicable to the system.

2.A timeline of the actions your CA took in response.
2020-03-24: The non-conformity is identified during the audit.
2020-04-02 – 2020-04-03: A check is performed whether there are other affected systems. No findings.
2020-04-02 – 2020-04-09: Based on the log data of the affected system, a review of all changes is performed retrospectively. The review of the log data of the affected system is completed. No findings.
2020-04-15: A weekly review of all configuration changes by qualified personnel is established (see point 7).
2020-07-08: The auditors inspect the new solution and determine it to fulfil the requirement (version 1.3 of NCSSR)

3.Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
N/A

4.A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
N/A

5.The complete certificate data for the problematic certificates.
N/A

6.Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
By default, all changes to systems are reviewed right after implementation. Apart from that, any changes to the configuration of any system are detected by a continuous monitoring solution, i.e. changes violating the security policies. The affected system is an appliance with limited customizability which is why the agent of the monitoring solution could not be implemented. The person responsible for the administration of the system was not properly informed (or misunderstood the instructions) and was thereby not aware of the exact requirement 1.h. (and 3.a.) of NCSSR v1.3. As a result, the person responsible for the implementation of the requirement assumed that a (less) regular review of the configuration would be sufficient and did not implement an equivalent solution. He informed the persons responsible for the operation of the PKIs about the finished installation of the system but not about the circumstance of the missing monitoring. On the other hand, the persons responsible for the operation of the PKIs mistakenly assumed that the requirement was implemented correctly and did not ask for more detailed feedback.

7.List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
The log data of the affected system has been reviewed retrospectively. Additionally, a check was performed whether any other systems are subject to the mistake to detect similar occurrences of this mistake.
Due to the standard solution not being applicable to the affected system, an individual process for the review of all changes to the configuration of the affected system was established. This process is based on an automated script checking the log data for any kind of changes and gathering those information. Qualified personnel (at least two separate persons) is required to actively confirm the correctness of the performed changes (or inform the ISMS otherwise which will then further investigate the issue).

In order to prevent such mistakes from reoccurring in the future, the following steps have been taken:

  • The persons responsible for the operation of the PKIs usually give sufficiently precise instructions or simply forward the entire requirement(s) so that the persons responsible for the administration of the systems and implementation of the requirements know exactly what is to be done or, at least, what is expected. In single cases, however, the instructions were simplified or summarized and lead to information being lost. Therefore, all persons responsible for the operation of the PKIs have been made aware of this potential source of error and are instructed to be sufficiently precise when communicating the requirements.
  • Even with the instructions being more clear, there is still a potential for error due to misunderstandings or plain human error by the person responsible for the implementation of the requirements. Therefore, the persons responsible for the operation of the PKIs are required to evaluate themselves and based on the feedback of the person responsible for the implementation whether the requirement is sufficiently met. In case of doubt, additional feedback must be asked for. This means, that a simple “Requirement has been implemented.” from the person responsible for the implementation of the requirement is no longer acceptable and must lead to the person responsible for the operation of the PKI to ask for additional feedback, unless the requirement has no complexity at all (e.g. ). This procedural measure will be included in the relevant documentation to be mandatory.

Please let us know if you require any further information.

Assignee: bwilson → Arnold.Essing
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

(In reply to Arnold Essing from comment #0)

I would like to understand all of this a lot more thoroughly, so I have added questions in-line below.

1.How your CA first became aware of the problem, and the time and date.
During the audit, carried out from 2020-03-23 till 2020-04-01, it was identified that unauthorized changes to the configuration of one system were not covered by a weekly review due to the default monitoring solution not being applicable to the system.

These configuration changes were not only not reviewed by weekly review, but also unauthorized?
Which system was it?
How long was the system subjected to this condition?
How many occurrences of unauthorized changes were there?

2.A timeline of the actions your CA took in response.
2020-03-24: The non-conformity is identified during the audit.
2020-04-02 – 2020-04-03: A check is performed whether there are other affected systems. No findings.
2020-04-02 – 2020-04-09: Based on the log data of the affected system, a review of all changes is performed retrospectively. The review of the log data of the affected system is completed. No findings.
2020-04-15: A weekly review of all configuration changes by qualified personnel is established (see point 7).
2020-07-08: The auditors inspect the new solution and determine it to fulfil the requirement (version 1.3 of NCSSR)

The timeline should include activities/facts occurring before discovery. Additional facts/event timelines help give context to item 6 (Why things weren't discovered sooner?)
Why wasn't this incident reported to Mozilla/Bugzilla earlier?

3.Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
N/A

In lieu of explaining "certificates with a problem" the newer approach for incident reporting is to identify the process that was failing and to explain why the process was/was not stopped. (https://wiki.mozilla.org/CA/Responding_To_An_Incident#Incident_Report) Also, many PKI standards require that processes be halted if auditing/logging is not functional. It does not appear that this approach was adopted. Why not?

4.A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
N/A

See the new incident reporting outline, referenced above.

5.The complete certificate data for the problematic certificates.
N/A
See the new incident reporting outline, referenced above.

6.Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
By default, all changes to systems are reviewed right after implementation. Apart from that, any changes to the configuration of any system are detected by a continuous monitoring solution, i.e. changes violating the security policies. The affected system is an appliance with limited customizability which is why the agent of the monitoring solution could not be implemented. The person responsible for the administration of the system was not properly informed (or misunderstood the instructions) and was thereby not aware of the exact requirement 1.h. (and 3.a.) of NCSSR v1.3. As a result, the person responsible for the implementation of the requirement assumed that a (less) regular review of the configuration would be sufficient and did not implement an equivalent solution. He informed the persons responsible for the operation of the PKIs about the finished installation of the system but not about the circumstance of the missing monitoring. On the other hand, the persons responsible for the operation of the PKIs mistakenly assumed that the requirement was implemented correctly and did not ask for more detailed feedback.

Which affected system with limited customizability could not be monitored with the agent? I'd assume there are other appliances for which the agent won't work. Are they addressed with a similar solution? Are personnel for such systems provided with instruction materials that ensures future compliance with 1.h and 3.a of the NCSSRs? What is being done to close the communication gap / compliance risk caused when knowledge of compliance obligations is not effectively understood by those with operations responsibilities?

7.List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
The log data of the affected system has been reviewed retrospectively. Additionally, a check was performed whether any other systems are subject to the mistake to detect similar occurrences of this mistake.

Were any other similar systems identified as a result of the inventory conducted? If yes, what were they?

Due to the standard solution not being applicable to the affected system, an individual process for the review of all changes to the configuration of the affected system was established. This process is based on an automated script checking the log data for any kind of changes and gathering those information. Qualified personnel (at least two separate persons) is required to actively confirm the correctness of the performed changes (or inform the ISMS otherwise which will then further investigate the issue).

Can the appliance be upgraded to work with the agent? Has the manufacturer/vendor been contacted to see whether functionality can be upgraded? Are there other models of the appliance that provide automated configuration management?

In order to prevent such mistakes from reoccurring in the future, the following steps have been taken:

  • The persons responsible for the operation of the PKIs usually give sufficiently precise instructions or simply forward the entire requirement(s) so that the persons responsible for the administration of the systems and implementation of the requirements know exactly what is to be done or, at least, what is expected. In single cases, however, the instructions were simplified or summarized and lead to information being lost. Therefore, all persons responsible for the operation of the PKIs have been made aware of this potential source of error and are instructed to be sufficiently precise when communicating the requirements.

As a matter of advice, this process should be replicated for all requirements-related matters - TeleSec should implement such communications processes, and others, to ensure compliance by operations personnel in other areas -- not just this isolated area.

  • Even with the instructions being more clear, there is still a potential for error due to misunderstandings or plain human error by the person responsible for the implementation of the requirements. Therefore, the persons responsible for the operation of the PKIs are required to evaluate themselves and based on the feedback of the person responsible for the implementation whether the requirement is sufficiently met. In case of doubt, additional feedback must be asked for. This means, that a simple “Requirement has been implemented.” from the person responsible for the implementation of the requirement is no longer acceptable and must lead to the person responsible for the operation of the PKI to ask for additional feedback, unless the requirement has no complexity at all (e.g. ). This procedural measure will be included in the relevant documentation to be mandatory.

The persons responsible for implementation should also be held accountable to understand what they are required to do. They should be required to acknowledge that they understand the instructions provided by the persons responsible for the operation of the PKIs.

Please let us know if you require any further information.

As I said at the top, I really want to understand this incident better so that I can provide additional advice and close this bug. Any additional information you can provide will be appreciated.

Flags: needinfo?(Arnold.Essing)

No unauthorized changes
The translation failed due to the use of the term “unauthorized changes”, which is really misleading. It should be said that unauthorized changes would not be discovered in a weekly review. The sentence is not supposed to say that there were ever in any form unauthorized changes. There were exclusively authorized changes. All changes go through a change management system including all ITIL processes (approvals, QA tasks, 4-eyes principle, ...).
Based on the finding, we checked retrospectively if there were any unauthorized changes. There were no unauthorized changes.

Affected system
The affected system is a central networking component. Beside this issue the system complies with all security requirements and can only be managed by our own authorized administrators.
The affected system cannot be updated to work with our standard client. We had to find here an alternative solution. Our auditors inspected the new solution on July 8, 2020 and determined that it fulfilled the requirement (version 1.3 of NCSSR).
Immediately after this finding, we checked whether there were other affected systems. There were no other affected systems. The other systems mentioned in 1h and 3a of the Network Security Requirements are being monitored with our standard agent or with other comparable tools where the standard agent cannot be used.

Review of the system-configuration before discovery
Since the system was commissioned for the affected CAs in 2017, the system configuration has been subjected to an at least annual review in accordance with our standard processes. There have never been any inconsistencies or errors.

Continuation of operations
Based on the classification "NC-B", we assessed the non-conformity as not so critical to security that operation should have been stopped immediately. The fact that there were no other findings besides this NC-B and that the management of the certificates is based on numerous other security measures. From our perspective the operation of the CA could be continued.

Delayed Bug report
After checking whether there have been any unauthorized changes, it was determined that there were no unauthorized changes and no security incidents, so we did not post a bug in this regard. For publishing the bug for the NC, we were under the impression that the correct time in this case would be the receiving of the audit reports on June 29, 2020 including the detailed information. The time period of the two part audit was longer than expected due to the COVID-19 crisis. In retrospect, we realized that we should have published the bug earlier.

Future compliance
If there are new specifications (Ballots, Browser policies, ETSI requirements, ...), the core team meets regularly with the solution managers and operation teams concerned and monitors the implementation of the requirements. In doing so, accountability is firmly placed. In order to implement this better and more effectively, our requirements management was redesigned at the start of 2020 and two new experts were brought in to support the core team.

Flags: needinfo?(Arnold.Essing)

our requirements management was redesigned at the start of 2020

Can you share more precise details here? In general, statements like "we redesigned the process" or "updated the documentation" don't provide much insight into the practices that work and don't work.

I think this sort of piggybacks on the "Delayed Bug Report", but also touches on the "Continuation of Operations". As Ben mentioned in Comment #1, the goal of the incident report (and its template) is to try to make sure we've got a holistic picture of the relevant facts, the impact, a clear understanding of how things used to work, and a new understanding of how they do work.

Some of the principles have been discussed in the past on mozilla.dev.security.policy, and you can see practical examples of how that plays out in incident reports.

I think your answers in Comment #2 are useful, but to Ben's point in Comment #1, it'd be useful to "redo" the report in Comment #0, following the template, working in these details to make sure we've structurally got a good perspective here.

For example, you refer to a variety of specific events ("Since the system was commissioned ... in 2017", "we assessed the non-conformity as not so critical", "After checking whether there have been", "our requirements management was redesigned at the start of 2020", "we checked retrospectively"), and all of those are good in the timeline, to make sure we've got a good chronological picture. Similarly, when it comes to "Future compliance", describing what was, and what is now, is equally important to help build the whole picture.

Flags: needinfo?(Arnold.Essing)

Dear Ben, dear Ryan,
Thank you for your feedback. As requested, we provide you a new incident report (based on the new template) and hope that this time we have included sufficient information.

1.How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
During our annual ETSI audit, carried out from 2020-03-23 till 2020-04-01, the auditors checked the implementation of the Network and Certificate System Security Requirements including the weekly review of changes to configurations as required by NCSSR 1.h. Upon inspection of sample systems in regard to this requirement, one system (a central network component) was found not to be compliant. Neither the standard solution nor a comparable solution was implemented to fulfil the requirement.

2.A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
2017: The system is commissioned. Since the system is an appliance with limited customizability, the standard client for monitoring of the configuration cannot be installed. Because the system is not part of a BR-relevant PKI yet, comparable security measures taken do not include a weekly review of the changes (see point 6 for further details).
2018-05-03: The first PKI subjected to Baseline Requirements/NCSSR is connected with the system, making it subject to the requirements of the BR. Due to communication mistakes (see point 6 for further details), the system is believed to be compliant with 1.h of NCSSR v1.3.
2020-01-01: The core team of the requirement management receives two new members which allows for the centralization of responsibility regarding the implementation of requirements.
2020-03-24: The problem is identified during an annual ETSI audit. A first assessment concluded that this finding alone is not critical to operation security and therefore operation was not stopped (see point 3 for further details). The standard solution is not applicable due to the affected system being an appliance and a custom solution must be implemented (see point 7).
2020-04-01: In a preliminary summary by the auditors we are informed that the finding will presumably be assessed as a non-conformity. However, we did not immediately open a bug report regarding this issue because we were under the impression that the correct time to report this as well as potential other nonconformities would be after receiving the audit report including all the detailed information and the official classification of the finding as a non-conformity. This assumption was further supported by the fact that we found other bug reports (regarding audit findings) that were opened after receiving the audit report or even later. Also, a verification from the auditors that the new solution was fulfilling the requirement (see point “2020-07-08” of this timeline) was deemed to be a useful information to be included in the bug report. In retrospect, we realize and understand that we should have published the bug report (or at least a preliminary report) way earlier.
2020-04-02 – 2020-04-03: A check is performed whether changes to configurations to all other systems are monitored as required by 1.h of NCSSR. There were no additional findings: either the standard client or a comparable solution was implemented to fulfil the requirement.
2020-04-02 – 2020-04-09: We also retrospectively reviewed all changes to the configuration of the system with the help of the log data. There were no findings of unauthorized changes to the configuration of the system.
2020-04-15: Since the standard client cannot be installed on the system, an alternative solution is implemented which fulfils the requirement (see point 7).
2020-06-29: Telekom Security receives a first draft of the audit report including the assessment of the finding as a non-conformity. Since the auditors will inspect the alternative solution a few days later (see next step in the timeline), the decision is made to provide the bug report afterwards.
2020-07-08: The auditors inspect the new solution and review the produced evidence. The new solution is determined to fulfil the requirements.
2020-07-09: We open this bug with the first version of this incident report.

3.Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.
There were no further findings and the system complies with all other security requirements. All changes to the system go through a change management system including all ITIL processes (approvals, QA tasks, 4-eyes principle …) as enforced by our corporation, and management of certificates is secured by numerous other security measures. Additionally, the retrospective check confirmed that there were no unauthorized changes and there still was a regular review of the configurations in accordance to our standard processes. Therefore, we assessed the non-conformity as not that critical to security to justify a stop of operation or of a particular process.

4.In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.
Only one system was affected by the finding. The problem was introduced in 2017/2018 and lasted until the finding in 2020. Other systems, on which the standard client cannot be installed, have alternative solutions implemented. This fact was reconfirmed by an additional check of all systems after the finding (as stated in the timeline).

5.In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases that are not involving a review of affected certificates, please provide other similar, relevant specifics, if any.
The affected system is an internal central network component only accessible via the administration network by authorized and trusted personnel from our own team, based on 4-eyes principle and other security measures.

6.Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
Pre-Info: By default, all changes to systems are reviewed right after implementation as part of the established change management. Apart from that, a continuous monitoring solution is implemented which detects any changes made to a system or its configuration. Should it not be possible to install this standard solution on a system, e.g. due to limited customizability of an appliance, other measures are taken to ensure a sufficient security level. In the case of systems that are subjected to the Baseline Requirements/NCSSR, these measures must also comprise a (weekly) review of the changes, while on other systems (e.g. for private and other PKIs with different infrastructure and requirements) this might not be necessary.
The affected system was commissioned in 2017 and was not part of any PKI subjected to the Baseline Requirements at that point. Since the standard solution could not be implemented, other security measures were implemented that did not comprise a weekly (or “continuous” as of v1.4 of NCSSR) review of the changes. In May 2018, the first PKI-components subjected to the Baseline Requirements were connected to the affected system, making it relevant in regard to the Baseline Requirements/NCSSR. At that point in time, the administrators of that system were not properly informed (or misunderstood the instructions) and were not aware of the exact requirement. Therefore, they assumed that the less regular review of the configuration and other security measures would still provide sufficient security. When they confirmed the successful configuration of the system, that information did not contain the details about what exactly had been implemented. On the other hand, the persons responsible for the operation of the PKI (and that also know about the relevant requirements) mistakenly assumed that the requirement was implemented correctly as for all other systems. It was not considered to ask for more detailed feedback to evaluate this fact on their own.
Regarding the avoided detection: Due to the communication mistake, both parties responsible for the system (operator of PKI/product on the one hand and administrators on the other hand) assumed that the system was correctly configured. Also, when checking the implementation of the requirement in internal and external audits, this specific system was not part of the samples until the audit in 2020.

7.List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
The log data of the affected system has been reviewed retrospectively. Additionally, a check was performed whether any other systems are subject to the mistake to detect similar occurrences of this mistake. Both checks resulted in no additional findings.
Due to the standard solution not being applicable, an alternative process for the review of all changes to the configuration of the affected system was established. This process is based on an automated script checking the log data for any kind of changes. Qualified personnel are then informed and at least two different persons are required to actively confirm the correctness of the performed changes. If necessary, the ISMS is informed and will further investigate the issue.
We determined that the main reason for the occurrence of the mistake was insufficient communication and partially unclear accountability. Therefore, in order to prevent such mistakes from reoccurring in the future, the following actions have been taken:
• The persons responsible for the operation of the PKIs usually give sufficiently precise instructions or simply forward the entire requirement(s) so that the persons responsible for the administration of the systems and implementation of the requirements know exactly what is to be done or, at least, what is expected. In single cases, however, the instructions were simplified or summarized and lead to information being lost or the person implementing the requirement simply misinterpreted or misunderstood. Therefore, all persons responsible for the operation of the PKIs have been made aware of this potential source of error and are instructed to be sufficiently precise when communicating the requirements. Obviously, persons responsible for implementation must request additional information should they not understand a requirement or should they not be aware of the requirements. If no further questions or requests for information are made, then this is considered to be an acknowledgement that the instructions were understood.
• However, even with the instructions being more precise, there is still a potential for error due to misunderstandings or plain human error (even if the administrator assumes to have understood the requirement). Therefore, based on the feedback from the administrator, the persons responsible for the operation of the PKIs are, themselves, required to evaluate whether the requirement is sufficiently met. In case of doubt, additional feedback must be asked for.
• At the start of 2020 we started partially redesigning our requirements management after two new members have been added to the core team. Prior, the management of requirements was mainly distributed to and performed by the different product/solution managers. Now, more and more requirements are centrally managed by the core team itself (due to the two new members/resources) and the core team also takes accountability for the correct implementation. Additionally, the database of requirements is entirely rebuilt to increase quality and manageability.
While the redesign of the requirement management is not an action in response to the issue described in this bug report we still think that it will have a significant influence regarding such issues and that it will help to prevent many potential mistakes from happening in the future. This includes the timely reporting of findings via Bugzilla.

We hope that we managed to incorporate your feedback with this new incident report.
Please let us know if you require any further information.

Flags: needinfo?(Arnold.Essing)

Thanks Arnold. Comment #4 certainly helps the "big picture" here, but I think some things stand out to me as worth thinking about.

The log data of the affected system has been reviewed retrospectively.

Can you describe how this log data is maintained? For example, in DigiNotar's case, log data was stored on the affected systems, and was erased. What I'm trying to encourage here is, if we think adversarially, what would an attacker need to do to evade detection here?

It sounds like, based on your description, this was an online system, so we can't, for example, examine access to the room for configuration. I'm hoping you can share more about how that reconciliation happened, and why it's trustworthy.

In retrospect, we realize and understand that we should have published the bug report (or at least a preliminary report) way earlier.

That understanding is good, but I'm not sure I see a clear understanding of how that new understanding has lead to improved processes to ensuring prompter reporting. Am I correct in assuming that Telekom Security is implying it's impossible for a failure to promptly disclose to ever happen again? How are the controls designed to ensure this?

This process is based on an automated script checking the log data for any kind of changes. Qualified personnel are then informed and at least two different persons are required to actively confirm the correctness of the performed changes. If necessary, the ISMS is informed and will further investigate the issue.

This seems like a negative case; or, as the saying goes, "the absence of evidence is not the evidence of absence". For example, is it possible for someone to make a change that disables running this automated script? Why or why not? It sounds like the current approach to configuration management is to rely on the system to honestly report about its own health, but how well does that hold up under an adversarial model? For example, in the report, it's stated:

The affected system is an internal central network component only accessible via the administration network by authorized and trusted personnel from our own team, based on 4-eyes principle and other security measures.

And it might be that you examined connections from the administration network to this device from another system (e.g. a firewall or network logger), you matched those connections to authorized changes, and thus reconciled and there were no unauthorized changes.

What I'm asking, I suppose, is to shift the thinking from "compliance" and into "defense", and perhaps help understand what sort of sources of data, and how they're designed, support the conclusion that things are fine.

Flags: needinfo?(Arnold.Essing)

Dear Ryan,
Thanks for your valuable feedback. We will keep your advices in mind and consider them when evaluating errors, incidents and problems in future reports. Regarding your remaining concerns, we hope to have addressed those with our answers below.

Trustworthy log data
All log data of all systems is immediately sent to a central, state-of-the-art and tamper-proof log server. In order to evade detection, an attacker would therefore be required to compromise this server as well as be capable to break its tamper-proof mechanisms or break modern cryptography itself. Alternatively, the attacker could stop the affected system from sending its log data to this central log server (by disabling this function or manipulating the network). But by doing so the actions taken to achieve this goal would still be sent to the log server. Since we found no evidences for either of these scenarios, we concluded that the review of the log data can be trusted to detect manipulations or show that there were no unauthorized changes.

Protection of automated script
The script reports all changes, not just “suspicious” or other specific data. Therefore, in our opinion, the script itself is not fitting into the saying you mentioned. However, you are obviously correct in assuming that an attacker could disable the script and thereby prevent evidence from being generated in the first place. To prevent this from happening undetected, we protect the script’s integrity with our standard monitoring solution (the one mentioned in the incident report above and that could not be applied to the affected system). As a result, an attacker would be required to not only disable the script but also get in control of our central monitoring system to prevent an alert in case of unauthorized changes.

Ensuring prompt reporting
Regarding the incident at hand, the actual failure happened within the core team of the compliance management (which is also responsible for the communication in this forum) itself. In that team we wrongly decided to wait with the disclosure due to lack of knowledge/experience. Based on the lessons-learned and the experience gained in handling/reading/evaluating this bug and other bugs in this forum, the core team now knows better and will not wait with the disclosure of an incident report or at least a preliminary report in the future. Besides that, we currently do not see how other controls or improvements to processes could further help in this case since the core team was informed the moment the non-conformity was detected. Although we currently do not see a scenario that could potentially lead to another failure, we obviously cannot ensure that a failure to promptly disclose is “impossible” to ever happen again. But we do think it is very unlikely.
We are aware that this type of answer might be a slightly unsatisfying “human errors happen, sorry”. However, based on the explanations above, we ask for your understanding. If you think that this is not sufficient, we would kindly ask you, what other measures you expect.

Flags: needinfo?(Arnold.Essing)
Flags: needinfo?(bwilson)

I don't have any further questions and would like to close this bug on or about next Wed. 27-Jan-2021. Does anyone else have any other follow up questions?
Thanks, Ben

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [audit-finding]
You need to log in before you can comment on or make changes to this bug.