Closed Bug 1511459 Opened 10 months ago Closed 9 months ago

Certum CA: Corrupted certificates

Categories

(NSS :: CA Certificate Compliance, task)

3.37
task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wtrapczynski, Assigned: wtrapczynski)

Details

(Whiteboard: [ca-compliance])

Attachments

(1 file)

799.66 KB, application/x-x509-ca-cert
Details
Attached file certs.pem
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36



Actual results:

This is an incident report for the issue according to https://wiki.mozilla.org/CA/Responding_To_An_Incident#Incident_Report

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

10.11.2018 10:10 UTC + 0 – We received a notification from our internal monitoring system concerning issues with publishing CRLs.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

(All times in UTC±00:00)

10.11.2018 10:10 – We received a notification from our internal monitoring system for issuing certificates and CRLs concerning issues with publishing CRLs. We started verification.
10.11.2018 12:00 – We established that one of about 50 CRLs has corrupted digital signature value. We noticed that this CRL has a much larger size that others. We verified that in short period of time over 30 000 certificates had been added to this CRL.
10.11.2018 15:30 – We confirmed that the signing module has a trouble with signing CRL greater than 1 MB. We started working on it.
10.11.2018 18:00 – We disabled the automatic publication of this CRL. We verified that others CRLs have correct signature.
11.11.2018 07:30 – As part of the post-failure verification procedure, we started the inspection of whole system including all certificates issued at that time.
11.11.2018 10:00 – We verified that some parts of issued certificates have corrupted digital signature. 
11.11.2018 10:40 – We established that one from a few working in parallel signing modules was producing corrupted signatures. We turned it off.
11.11.2018 18:00 – We confirmed that the reason for the corrupted signature of certificates was a large CRL which prevented further correct operation of that signing module. 
11.11.2018 19:30 – We left only one working signing module which prevent further mis-issuances.
19.11.2018 11:00 – We deployed on production an additional digital signature verification in external module, out of the signing module.
19.11.2018 21:00 – We deployed on production a new version of the signing module which correctly handle a large CRL.

3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

11.11.2018 17:47

4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

355.

The first one: 10.11.2018 01:26:10
The last one: 11.11.2018 17:47:36

All certificates were revoked.

5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

Full list of certificates in attachment.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The main reason for the corrupted operation of the signing module was the lack of proper handling of a large CRL, greater than 1 MB. At the moment when the signing module received such a large list for signing it was not able to sign it correctly. In addition, the signing module started to incorrectly sign the remaining objects received for signing later, i.e. after receiving a large CRL for signature.

Due to the fact that at the time when problem occurred we were using simultaneously several signing modules, the problem did not affect all certificates issued at that time. Our analysis shows that the problem affected about 10% of all certificates issued at that time.

We have been using this signing module for a few last years and at the time of its implementation the tests did not include creation of the signature for such large CRL. None of our CRLs for SSL certificates have exceeded 100 KB so far. Such a significant increase in the size of one of the CRLs was associated with the mass revocation of certificates by one of our partner (revocations was due to business reasons). In a short time, almost 30,000 certificates were found on the CRL, what is extremely rare.

All issued certificates were unusable due to corrupted signature.

7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

We have deployed a new version of the signing module that correctly signs large CRLs. From now, we are able to sign a CRL that is up to 128 MB. In addition, we have improved the part of the signing module responsible for verification of signatures (at the time of failure it did not work properly).

We have deployed additional verification of certificate and CRL signatures in the external component, in addition to the signing module. This module blocks the issuance of certificates and CRLs that have an corrupted signature.

We have extended the monitoring system tests that will allow us to faster detection of incorrectly signed certificates or CRLs.
Thank you for the incident report. I have a few questions:

* Why were 30,000 certificates suddenly revoked?
* At any time during this incident were you serving an expired CRL, or a CRL in violation of the BRs?
* Was OCSP impacted? (some CAs generate OCSP responses from CRLs)
* Why did 19 days pass before this was reported to Mozilla?
* Will you please post the incident report to the mozilla.dev.security.policy mailing list?
Assignee: wthayer → wtrapczynski
Flags: needinfo?(wtrapczynski)
Summary: Incident report Certum CA - corrupted certificates → Certum CA: Corrupted certificates
Whiteboard: [ca-compliance]
Wayne, the answers to your questions below.

* Why were 30,000 certificates suddenly revoked?

Our partner explained to us that these revocations were part of his arrangements with customers. Their customers have terminated the contracts. To clarify, there was no key compromise.

* At any time during this incident were you serving an expired CRL, or a CRL in violation of the BRs?

Any an expired CRL was not serving at any time. All CRLs was also available throughout the duration of the issue. 

Between 10.11.2018 01:05 (UTC±00:00) and 14.10.2018 07:35 (UTC±00:00) we were serving one CRL with corrupted signature. On November, 14th we generated correct CRL using dedicated software which we have prepared when it turned out that the implementation of the final fix will take longer time than we initially assumed.

* Was OCSP impacted? (some CAs generate OCSP responses from CRLs)

No, OCSP was not impacted. We were serving correct OCSP responses all the time.

* Why did 19 days pass before this was reported to Mozilla?

At the very beginning we focused on solving the problem in order to preventing another mis-issuances. Then we would like to deploy all necessary fixes and test it thoroughly. We need to check whether other components like e.g. OCSP were working well during the incident. We had to be sure that everything worked well. Additionally, we had to collect all necessary data to create the incident report. All these actions take a lot of time and that is why we have published this incident report with a few days delay. In addition, we assumed (probably wrong) that the date of November, 19th (when we deployed the final fix) is the date from we need to start counting the time we should send the incident report to Mozilla.

* Will you please post the incident report to the mozilla.dev.security.policy mailing list?

I have posted it to the mozilla.dev.security.policy mailing list.
Flags: needinfo?(wtrapczynski)
I made a typo. The second date in the answer number two should be "14.11.2018 07:35 (UTC±00:00)".
Resolving this incident as it appears that remediation is complete and discussion has ended.
Status: UNCONFIRMED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.