Closed Bug 1671113 Opened 4 years ago Closed 4 years ago

SwissSign: Failure to provide a preliminary report within 24 hours.

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fozzie, Assigned: michael.guenther)

Details

(Whiteboard: [ca-compliance] [disclosure-failure])

I sent a problem report to info@swisssign.com and have yet to receive a preliminary report:

Tuesday 13 October 08:54 UTC - I sent a report concerning the certificates mentioned in bug 1670894.

As of Wednesday 14 October 09:00 UTC I have yet to receive a response.

Assignee: bwilson → michael.guenther
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

As reported in the timeline of https://bugzilla.mozilla.org/show_bug.cgi?id=1670894#c1 we replied to George's mail on 20201014 11:41 CEST.

We realize that this was after 24h and 47 minutes which is slightly over the defined 24h.

The responsible person was involved in the analysis of the Bugzilla https://bugzilla.mozilla.org/show_bug.cgi?id=1670894#c1 and prioritized this incident.

We will repeat the training with the involved support team to ensure that we will respond within 24h.

Per Responding To An Incident#Incident Report:

For example, it’s not sufficient to say that “human error” of “lack of training” was a root cause for the incident, nor that “training has been improved” as a solution. While a lack of training may have contributed to the issue, it’s also possible that error-prone tools or practices were required, and making those tools less reliant on training is the correct solution. When training or a process is improved, the CA is expected to provide specific details about the original and corrected material, and specifically detail the changes that were made, and how they tie to the issue. Training alone should not be seen as a sufficient mitigation, and focus should be made on removing error-prone manual steps from the system entirely.

Will SwissSign be providing a full incident report for this bug?

Flags: needinfo?(torsten.kahlstadt)

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

Bugzilla 1670894 posted by George on 20201013

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

20201013 10:54 CEST Mail from George informing us about the misissuance
20201013 02:12 PDT Posting by George
20201013 Internal investigation started
20201014 11:41 CEST Responding to mail
20201014 17:40 CEST posting incident report for Bugzilla 1670894
20201016 17:56 CEST Posting to this ticket
20201016 09:00 PDT Posting by George to this ticket

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

not applicable – n/a (no certificate involved).

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

not applicable – n/a (no certificate involved).

5. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

not applicable – n/a (no certificate involved).

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

As reported in the timeline of https://bugzilla.mozilla.org/show_bug.cgi?id=1670894#c1 we replied to George's mail on 20201014 11:41 CEST.
We realize that this was after 24h and 47 minutes which is slightly over the defined 24h.
The responsible person was involved in the analysis of the Bugzilla https://bugzilla.mozilla.org/show_bug.cgi?id=1670894#c1 and prioritized this incident.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

1.) It was a human error (mistake). We will repeat the training with the involved support team to ensure that we will respond within 24h.

While the wiki documents that 'human error' is not sufficient it was in this case. The process of responding to incidents requires human interaction per definition and an automation is not possible. The answer was being drafted and analysis was ongoing.

Flags: needinfo?(torsten.kahlstadt)

While analysing a problem report in this case does require human interaction, the notification does not. Has SwissSign looked at how other CAs are automating the problem reporting notifications after an agent determines the outcome of the problem report, such as DigiCert with bug 1649277 and bug 1639801? Could SwissSign implement such a system as well?

Flags: needinfo?(torsten.kahlstadt)

Final Report SwissSign

Even if the wiki documents that "human error" is not enough, it was unfortunately the case. It was a human error (mistake). But SwissSign is constantly striving to improve.

  1. We will repeat the training with the involved support team to ensure that we will respond within 24h.
  2. SwissSign will examine the support of software such as DigiCert and their use. However, SwissSign’s understanding is that the automation is not a requirement and we will assess the benefits vs. the risks of an automated revocation.
  3. SwissSign constantly analyses and optimises the processes.

Thank you very much for your support and advice. Our improvement process is positively supported by it which is very helpful for us.

Flags: needinfo?(torsten.kahlstadt)

It's concerning that SwissSign isn't initially interested in automation because it isn't a requirement. We've seen this before in bug 1551364 where SwissSign said that they would not implement any restraints for values in the ST or L fields, which then lead to bug 1613334 which eventually lead to bug 1670894. How will a repeat in training prevent this issue reoccuring as I presume SwissSign has trained it's staff on this requirement before?

SwissSign still hasn't mentioned what this retraining entails or what the training was previously.

Flags: needinfo?(michael.guenther)

With the last update of our CP/CPS SwissSign introduced a dedicated reporting mail (certificatemisuse@swisssign.com) which is under the responsibility of our Manager on Call team (MOC). This ensures that all problem reports sent to this dedicated address will be easier to recognize and reviewed on time. All MOCs received a training regarding Bugzilla rules/requirements with a stronger an emphasis on the topic to give a preliminary report for any incidents reported within 24 hours after receiving the reported problem. Additionally, they stay fully responsible until the incident is closed and will orchestrate all internal tasks.

Further future automation will be considered as part of ongoing improvements.

Flags: needinfo?(michael.guenther)
Flags: needinfo?(bwilson)

Hi Mike,
I am inclined to close this matter, but before I do, I would like to know whether any additional steps might be implemented. For example, one CA that experienced a similar delay in acknowledging certificate problem reports implemented an automated response template.
Thanks,
Ben

Flags: needinfo?(michael.guenther)

Hi Ben,
as written before we aim for stronger automation. While internally discussing this case we modeled a process for the MoCs taking the input of the DigiCert case into consideration. The dedicated mail-address which is exclusively used for CA compliance topics was the first step.

By end of February we intend to implement an automated pre-report mail to the reporter triggered within 24 hours.

Additionally, we change the default revocation setting. A revocation ticket is created at the same time in addition to the reporting ticket. The revocation ticket is a task for the support team to revoke the certificate(s) within the required 5 days unless the analysis of the MoC or the second control instance (independent team) stops it (e.g. the analysis shows that the report is not correct and the certificate was not mis-issued).

In the coming months we plan to implement further improvements but as of today I can’t commit to specific tasks or dates. I hope this additional information shows where we are working towards.
Mike

Flags: needinfo?(michael.guenther)

I think this matter can be closed because SwissSign will be implementing automation and other items mentioned in Comment #9. I will close this on or about Friday, 12-Feb-2021, unless anyone thinks we should keep this open for additional follow-ups.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [disclosure-failure]
You need to log in before you can comment on or make changes to this bug.