Closed Bug 1665688 Opened 1 year ago Closed 1 year ago

eMudhra: emSign CA ECC Test Certificate Misissuance

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: vijay, Assigned: vijay)

Details

(Whiteboard: [ca-compliance])

Issue: Recently issued SSL/TLS certificates with ECC Algorithm contained Key Usage as keyEncipherment. The CA has not issued ECC certificates to external customers. Hence the impacted certificates are limited to the 4 active certificates configured for our Test websites (and other test certificates which were part of revoked and expired URL).

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
Through the problem reported to us on 10-Sep-2020 09:19 (Indian Time).

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
(Times are provided in Indian Time.)
20-Aug-2020 00:00: Configurations made to certificate issuance procedure towards generating test certificates for test URLs.
21-Aug-2020 04:00: First set of test certificates issued.
28-Aug-2020 00:00: First set of test certificates revoked for configuring with different CT Log servers.
28-Aug-2020 00:00: Second set of test certificates revoked for configuring with different CT Logs servers
10-Sep-2020 09:19: Problem reported on Key Usage issue in ECC test certificates.
10-Sep-2020 10:30: Analysis of Misissuance started.
10-Sep-2020 14:30: Incident flagged. Non-critical. No impact to external customers. Only internal test certificates impacted.
10-Sep-2020 15:00: Changes made in the system to correct the configurations.
10-Sep-2020 15:15: New test certificates issuance started.
10-Sep-2020 16:00: Test URL configuration for new test certificates started.
11-Sep-2020 16:00: New test certificates issuance completed and verified.
11-Sep-2020 16:00: Test URL configuration for new test certificates completed. Impact to external users accessing test URLs mitigated successfully.
12-Sep-2020 00:00: Independent Lint Tests carried on each test certificate.
14-Sep-2020 16:00: Incident Analysis reviewed.
14-Sep-2020 21:00: Revocations Completed.
15-Sep-2020 11:00: Analysis of Misissuance Completed.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.
CA has stopped misissuance, and corrected the issue.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.
Date the first certificate: 21-Aug-2020
Date the last certificate: 28-Aug-2020
Total number of problematic certificates issued were 24 (including active, revoked or expired) out of which 20 were already revoked/expired prior to the issue coming to notice, and remaining 4 were active. These 4 were revoked as per above point #2

5. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.
https://crt.sh/?id=3269114628
https://crt.sh/?id=3269126739
https://crt.sh/?id=3269147978
https://crt.sh/?id=3269161978
https://crt.sh/?id=3269183426
https://crt.sh/?id=3269211713
https://crt.sh/?id=3269221956
https://crt.sh/?id=3269240727
https://crt.sh/?id=3269260621
https://crt.sh/?id=3269271242
https://crt.sh/?id=3269323597
https://crt.sh/?id=3269293197
https://crt.sh/?id=3300799444
https://crt.sh/?id=3300815759
https://crt.sh/?id=3300831857
https://crt.sh/?id=3300884469
https://crt.sh/?id=3300892055
https://crt.sh/?id=3300921028
https://crt.sh/?id=3300988998
https://crt.sh/?id=3301000571
https://crt.sh/?id=3301065208
https://crt.sh/?id=3301101352
https://crt.sh/?id=3301113384
https://crt.sh/?id=3301120909

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
This was resulted due to a misconfiguration of one particular procedure for manual issuance for test certificates, where the certificate issuance was containing wrong key usage value. This procedure was not exposed for any online certificate issuance system, and was part of manual processing for test-certificate generation. Being an exceptional process for internal use, this misconfiguration went unnoticed.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
This matter is taken at highest importance. All manual procedures are now updated in internal operating procedures to perform additional checks post certificate generation, specifically in these kind of cases. Necessary re-training of people has been made towards the same.

Summary: Incident Report: ECC Test Certificate Misissuance → Incident Report: eMudhra emSign CA's ECC Test Certificate Misissuance
Assignee: bwilson → vijay
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]
Summary: Incident Report: eMudhra emSign CA's ECC Test Certificate Misissuance → eMudhra: emSign CA ECC Test Certificate Misissuance

Thanks for this disclosure. I believe that this incident can be closed and intend to close it on or about 9-October-2020 unless there are questions or concerns that need further discussion.

Flags: needinfo?(bwilson)

Ben: This report seems to be in the spirit of the negative example from https://wiki.mozilla.org/CA/Responding_To_An_Incident#Incident_Report , specifically:

For example, it’s not sufficient to say that “human error” of “lack of training” was a root cause for the incident, nor that “training has been improved” as a solution.

That is, this response states that:

This was resulted due to a misconfiguration of one particular procedure for manual issuance for test certificates, where the certificate issuance was containing wrong key usage value.

and that the solution is:

All manual procedures are now updated in internal operating procedures to perform additional checks post certificate generation, specifically in these kind of cases. Necessary re-training of people has been made towards the same.

I can't help but feel like this provides zero useful information for any other CA, or any relying party, in understanding what went wrong, how it was corrected, and how to avoid similar mistakes.

The misconfiguration is a symptom/statement, not a root cause. Understanding how that misconfiguration was possible, what the previous steps were to ensure correctness prior to certificate generation, what the new steps are to ensure correctness prior to certificate generation. The statement "necessary re-training" provides no useful detail, because it doesn't even establish that this was a training issue in a first place. However, if we assume it was a training issue, then what were the prior training materials, why were they deficient, and how was that corrected?

I'm concerned that this incident approaches it as if this was a one-off, rather than a series of systemic failures that ultimately resulted in misissuance. For example, I see nothing that would prevent this from happening, say, next year, when "training wasn't as frequent" or "new folks were hired".

These are the sorts of questions I think we'd like to get to in the incident report. I can't help but say I'm also troubled by statements like:

No impact to external customers. Only internal test certificates impacted.

As, while that's descriptive, it's not dispositive to how a CA treats incidents. It doesn't matter whether it was external or internal customers, it was misissued in the first place.

An ideal incident report here would identify:

  • How the original configuration error was introduced.
  • What the previous steps were for performing manual ceremonies.
  • How this escaped, say, preissuance linting (which is relevant, even for manual ceremonies).
  • How manual ceremonies have been updated to prevent future mistakes.
  • If this was training, what was the training focused on?
    • For example, was it training about what RFC 5280-et-al require?
    • Was it training about prior incidents of CAs having made this mistake?
    • Was it training on existing procedures for reviewing manual ceremonies?

There's a lot we can learn here, but I'm concerned with any CA that is able to perform manual issuance and miss something. If CAs aren't treating manual ceremonies as incredibly critical, with layers of review and tests before they ever touch CA key material, that's concerning. This incident report is the opportunity to provide a better understanding of what was done and why that shouldn't be a concern.

Flags: needinfo?(vijay)

Vijay,
Could you please respond to Ryan's questions and comments?
Thanks,
Ben

With reference to Ryan’s comment above, it is quite welcoming that it gives more understanding on the purpose of disclosure. A learning for our team as well to report any future incidents (if it happens!), and we can be attentive to these aspects. Thanks.

On the incident, we do not wish to state it as ‘lack of training’ issue. Quoting our initial statement, “This was resulted due to a misconfiguration of one particular procedure for manual issuance for test certificates” AND “All manual procedures are now updated in internal operating procedures to perform additional checks post certificate generation, specifically in these kind of cases. Necessary re-training of people has been made towards the same.”

The re-training here mainly refer to the point of ‘updated procedures’, which is a practice whenever we make an update. This training is incidental and we did not mean it as the problem/solution. The training part also included RFC requirements and couple of similar mis-issuances of other CAs in the past.

We believe below explanation of reasons providing cause and solution, in addition to what we stated earlier (above) would give more clarity on the points highlighted in the comment.

Reason#1: Misconfiguration in profile, resulted due to oversight considering it as ‘for test certificates’. Cloning a procedure from RSA and using it resulted in having the trace of wrong key usage. As a solution, we have introduced additional check to go through second individual which adds additional verification layer to avoid oversight mistakes in profile configurations.

Reason#2: Linting process which was not performed, once again considering it as ‘for test certificates’. As a solution, we have updated the procedures to cover all cases (including test certificates) for lint tests.

We believe this may give better clarity on incident detail.
(In reply to Ben Wilson from comment #3)

Vijay,
Could you please respond to Ryan's questions and comments?
Thanks,
Ben

Flags: needinfo?(vijay)

I'll push closure of this out for another week - setting a tentative closure date of 16-Oct-2020.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.