Closed Bug 1677737 Opened 5 years ago Closed 4 years ago

SwissSign: duplicate serial number

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: agwa-bugs, Assigned: michael.guenther)

Details

(Whiteboard: [ca-compliance] [ev-misissuance])

The CA "SwissSign EV Gold CA 2014 - G22" has issued two
certificates with the serial number
17:35:87:81:d4:e3:d5:2d:71:48:47:7f:43:f0:60:89:8a:79:3d:dc.

The first certificate: https://crt.sh/?sha256=9A301328C6653B779BE82E10D130DB882B705C71BE97F695DE76DEE7BA665B1B

The second certificate is presumed to exist based on the presence of
this precertificate:
https://crt.sh/?sha256=25247C9EAF762CF67096D2EA0B1AEB08312D2F7DE705D5C1FE6650A2F8429BEC

The certificates differ in the value of the Certificate Policies extension.

This appears to be an exact recurrence of the issue previously reported in Bug 1636140, and demonstrates the inadequacy of SwissSign's "solution", which was called out at the time as being risky: https://bugzilla.mozilla.org/show_bug.cgi?id=1636140#c6

Assignee: bwilson → michael.guenther
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
Posting by Andrew Ayer on 17 November 2020 in Bugzilla

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
20201117 16:41 CET receiving the Bugzilla notification
20201117 17:00 CET Analysis and confirmation of the issue
20201117 18:00 CET Checking for other misissuances -> no additional certificates found
20201117 19:30 CET Re-Checking for other open precerts -> no additional precerts found
20201117 21:15 CET Information to auditors
20201118 09:15 CET Management decision to implement a technical safeguard
20201118 14:40 CET posting this report

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.
Yes, we can confirm that there is no other affected certificate.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.
The issue is with the X509v3 Certificate Policies extension (CPS link refers to http instead of https): :

5. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.
The only affected certificate (precert and leaf) is:
https://crt.sh/?sha256=9A301328C6653B779BE82E10D130DB882B705C71BE97F695DE76DEE7BA665B1B (leaf)
https://crt.sh/?sha256=25247C9EAF762CF67096D2EA0B1AEB08312D2F7DE705D5C1FE6650A2F8429BEC (precert)

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
The issue is directly connected to updating X509v3 fields based on publishing our CP/CPS.
We updated our CP/CPS on 20201117 ~14:00 CET . Following our procedure introduced in 2020. However the manual process did not hold up to our expectations and one precertificate has been missed.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
The next steps are:

  • We contact the customer today and will revoke the certificate at the latest until this Friday 20201120 (within 5 days)..
  • Development of user story to implement technical safeguards
  • We will implement a technical safeguard safeguard to automatically prevent such problem from happening again

We will update this ticket when

  • Revocation is done
  • We have a solution for the user story available
  • The remediation measures are implemented

Revocation was done on Friday 20201120 17.01 UTC

We will update this ticket when

  • We have a solution for the user story available
  • The remediation measures are implemented

Can you please provide a more detailed explanation about what happened? The best I can make out it "Things didn't happen like we expect them to happen", which, while factually true, provides very little useful insight or understanding to help prevent this, either at SwissSign or other CAs.

Specifically:

Development of user story to implement technical safeguards
We will implement a technical safeguard safeguard to automatically prevent such problem from happening again

What went wrong, and what are the technical safeguards that were implemented, and how do they relate to what went wrong? Helping understand the flow is critical to being confident that the systemic issue is fixed (e.g. that any configuration changes between issuing a pre-cert will prevent the corresponding certificate from being issued)

Flags: needinfo?(michael.guenther)

I am afraid that there is not much new insight to be gained. As you pointed out the steps in https://bugzilla.mozilla.org/show_bug.cgi?id=1636140#c6 were not an optimal process (expecially step 2). We believed that we could make it work as we thought the process looked easy enough to handle (https://bugzilla.mozilla.org/show_bug.cgi?id=1636140#c8).

In retrospect our risk assumption was wrong as per step 2.i. (https://bugzilla.mozilla.org/show_bug.cgi?id=1636140#c12) this one certificate was missed as the script was not run on this ICA which resulted in the misissuance. The technical support was based on queries running (manually triggered).

Based on this outcome there little discussion as we agreed that a non-manual implementation (instead of our work procedure approach) is needed (see initial post point 7). Our DevTeam/process teams are still evaluating the options (including a change to close the possibility of the time gap between precert/cert). We will update the ticket when our evaluation comes to a final end.

The goal is to technically ensure that a future update of the CP/CPS will not result in a misissuance because of any Certificate Policies extension.

Flags: needinfo?(michael.guenther)

Mike: Thanks, especially for addressing Bug 1636140.

In terms of Incident Report expectations, bugs like that are something we'd call incredibly relevant to this issue, factoring in on the timeline of the report, and factoring in to the discussion about what steps were taken. Are there any other relevant bugs, either from SwissSign or other CAs, you'd like to highlight here as factoring in as things SwissSign was aware of? This is a rare opportunity to attempt to correct the incident report and provide a holistic picture of SwissSign operations now and going forward. I would encourage you to take this opportunity to correct any impressions, by having a detailed and thorough report, because right now, it seems like SwissSign was warned about a risk, chose to do it anyways, and had an incident, and that does not look very good at all.

Thank you very much for your feedback. It helps and adds to our continued internal efforts to improve our work processes and systems. We have taken the feedback on board and are working on it.
With regard to the case you are referring to, that includes the reassessment of our roadmap for corresponding improvements under this perspective on the management level. Of course we will provide further feedback on the resulting facts and insights.

As a follow up to my comment 2
We have decided to go with the following user story to solve the issue on two levels:
1. Data: Generate the final-certificate (leaf) out of the data of pre-certificate instead of constructing it from scratch out of Product definitions.
2. Timing: Generate the final certificate immediately upon the receipt of the SCTs rather than waiting for the interaction of the user

The first part of the story ensures that no more mismatches are possible between precert and cert as in the case of this bug. While the second story ensures that the certs are issued within a reasonable timeframe to avoid that any changes happening in the meantime.

We are currently working on a fast release date.

In regard to comment 3 and 5
When we designed the workflow to mitigate the issue in https://bugzilla.mozilla.org/show_bug.cgi?id=1636140 we integrated the scripts we used to check for other possible 'open' precerts (https://bugzilla.mozilla.org/show_bug.cgi?id=1636140#c2 referencing 20200508 7:45 CEST).

This script was then used in step 2.i. (https://bugzilla.mozilla.org/show_bug.cgi?id=1636140#c12). This script is run with an ICA as parameter.
What happened was that one ICA was not added to the list which resulted in the misissuance reported in this bug (see our report above for further details).

All of this triggered a re-evaluation of our original workflow we had put in place which resulted in long internal discussion and evaluation of effective remediation measures ending in the user story in my post.

The original risk assumption and the designed workflow was based on the experience of the teams involved. While taking into account Andrew's (https://bugzilla.mozilla.org/show_bug.cgi?id=1636140#c10) and your (https://bugzilla.mozilla.org/show_bug.cgi?id=1636140#c6) concerns (or any other input and experience of the community), at the time of implementation we were convinced that it will work. We are continuing to improve and taking into account any further input in order to streamline the processes.
 
On that note if you have further input or remark about our user story above please share them with us as the dev team is currently working on it.

Flags: needinfo?(bwilson)

I believe this incident can be closed, so I will call it up again on 7-April-2021 for closure unless there are any objections.

Has SwissSign deployed the changes described in Comment 7? If so, I believe this bug can be closed.

Flags: needinfo?(michael.guenther)

We had to push the release from March to the next month. The release is now scheduled for the last week of April 2021. This is in line with the publishing of new CP/CPS in the upcoming weeks and will prevent of having the same issue again.

I suggest that the ticket is kept upon until I can confirm the release.

Flags: needinfo?(michael.guenther)

The update was released last weekend and all the system has been stable for a week. Therefore, I can confirm that the changes are deployed and this ticket can be closed in my point of view.

I'll schedule this to be closed on 5-May-2021.

Ben: I believe you meant to close this? Closing it out based on Comment #13, but apologies if I misunderstood.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ev-misissuance]
You need to log in before you can comment on or make changes to this bug.