Closed Bug 1824319 Opened 1 year ago Closed 8 months ago

Actalis: pre-certificates with “certificateHold” as the revocation reason

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: adriano.santoni, Assigned: adriano.santoni)

Details

(Whiteboard: [ca-compliance] [crl-failure])

Steps to reproduce:

We received a report that eight certificates, issued by Actalis, have been revoked with the reason "certificateHold".

Following the initial analysis, we have obtained a preliminary understanding of the incident. It appears that all of the affected certificates are pre-certificates for which the corresponding certificates were not issued . The issue was caused by a background service in EJBCA that automatically revokes pre-certificates if they do not become certificates due to various reasons such as insufficient CT-Log signatures or other issues that prevent issuance. However, due to a bug in EJBCA, the reason specified for revocation was incorrect. This bug has been resolved in a more recent version of EJBCA.

Once we have completed our investigation, we will promptly publish a full incident report in the appropriate format.

Assignee: nobody → adriano.santoni
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [crl-failure]

This morning we deployed an update to the EJBCA software that fixes the problem.
We will shortly publish a full incident report.

Here is our full incident report:

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date.

We received an email from michael.lettona@digicert.com, sent to our mailbox cert-problem@actalis.it, reporting that we had 8 certificates revoked with "certificateHold" as the revocation reason.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was performed.

(all times are local Italian times)
2023-03-23 18:34 | We received an email from michael.lettona@digicert.com reporting that we had 8 certificates revoked with "certificateHold" as the revocation reason; the email specified the sha256 hash, serial number, issuer org, crl reason, and revocation times of the affected certificates.
2023-03-24 07:36 | We began our investigations on the reported issue. First of all, we checked that the report was reliable and accurate, and we found that it was. More precisely, the issue was related to pre-certificates. Then we started looking into the possible causes of the problem.
2023-03-24 08:41 | Since our CA system does not allow to modify the status of a pre-certificate, by no means, be it manually or via APIs, we came to the preliminary conclusion that the issue was related to the background service in the EJBCA software that automatically revokes "orphaned" pre-certificates.
2023-03-24 08:41 | We decided to fix the revocation status of those 8 pre-certificates by modifying the relevant values directly into the CA database, before planning for a more general solution.
2023-03-24 10:35 | We finished modifying the revocation reason codes into the CA database for those 8 pre-certificates, then we regenerated the CRL and checked that it was as expected. We also checked our OCSP responses for those pre-certificates and found that it was okay.
2023-03-24 11:18 | We found sufficient evidences that the root cause of the issue was indeed a bug in the EJBCA software. At this point we started discussing the options available to fix the problem permanently. We finally decided to schedule an EJBCA software update for the morning of March 29th.
2023-03-29 07:30 | We deployed an update to the EJBCA software that fixes the said bug.

  1. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

This incident was not related to the issuance of TLS certificates.

  1. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

A total of eight (8) pre-certificates were affected, as listed in section 5 below.

  1. In a case involving TLS server certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. It is also recommended that you use this form in your list "https://crt.sh/?sha256=[sha256-hash]", unless circumstances dictate otherwise. When the incident being reported involves an SMIME certificate, if disclosure of personally identifiable information in the certificate may be contrary to applicable law, please provide at least the certificate serial number and SHA256 hash of the certificate. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

https://crt.sh?q=1d4ee4cf692e36ec8784da67765ce7abe3496fd95477682e48f38215bcf4b31a
https://crt.sh?q=34aa0e625dfdffa3993f1d7f169cc1e69e5d5b385d43692a62aeb8519041ca5e
https://crt.sh?q=6f3dfe39d158cd3fefcfb1c4a0d605c0d75b6b9418032f4a47812f2e84ad9c70
https://crt.sh?q=7e8a0042770fd460b384fa53d44b2809f849a3c91e081f9a3849c25e7257e460
https://crt.sh?q=9cfc91598fa88577b5eb2293c4c061bd5062c0661a2f21085fcf27ec59403efd
https://crt.sh?q=9ebd24ed2575267a5401764fe2b2c7f4c9bb68b0bde19c36cdda23c7bb07301c
https://crt.sh?q=c28dacdee6d6897440339d6905fead80cee3f96c5075a8e3a5f982f6e7cbb169
https://crt.sh?q=e7f50617aec00fe1715bddbce0136e5634a7fccd4c9040f66e157cff86fb3537

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The issue was caused by a background service in the EJBCA software that automatically revokes pre-certificates if they do not become certificates due to various reasons (such as insufficient CT-Log signatures or other issues that prevent issuance). However, due to a bug in the EJBCA software, the revocation reason code was set to 6 (certificateHold) – therefore incorrect, since such reason code is prohibited by the BR. This bug was not highlighted in the EJBCA v7.11 release notes as a critical one, from a compliance point of view, and we found no other reasons for planning an urgent installation of the new EJBCA release, so we had planned to install it later on.

  1. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future. The steps should include the action(s) for resolving the issue, the status of each action, and the date each action will be completed.

On March 29th, we deployed an update to the EJBCA software that fixes the said bug. We do not expect the same problem to occur again, at least not for the same cause.

We are discussing with the SW vendor on how to improve the information we receive about compliance-affecting bugs and related fixes.

Did Actalis review the similar incident report in bug 1824257? I have some similar comments as those in that bug.

The timeline is missing key events such when software was updated that introduced the bug and when the incident started. The timeline shouldn't start when the incident was reported, because there must have been proceeding events to cause the incident.

This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was performed.

In section 4, the report is missing dates for the certificates.

For each problem: the number of certificates, and the date the first and last certificates with that problem were issued.

I think this report misidentifies the root cause of the incident, providing more detail about the activities that lead up to the incident would clarify that. A bug in EJBCA software cannot be the root cause, because EJBCA doesn't control the Actalis CA. Section 6 should identify what changes to processes or deployment of software by Actalis caused this incident and what validations and decision making were done at that time. Section 7 should then list how Actalis will prevent future bugs from causing incidents and how any software changes/new processes will be evaluated to ensure they meet all baseline and root store requirements.

Mathew,

on your first question: yes, we did review the similar incident report in bug 1824257.

Regarding the missing dates:

  • first affected (pre)certificate was issued on 2023-03-01 12:05:54 CET
  • last affected (pre)certificate was issued on 2023-03-21 10:01:13 CET

We performed a search into our CA's database and found no other revoked (pre)certificates with "certificateHold" as the revocation reason code.

I will provide an answer to your remaining questions shortly.

Mathew, find below additional explanations and details.

In version 7.7.1 of the EJBCA software, released in September 2021, the previously mentioned background service ("Pre-Certificate Revocation Service") was introduced which automatically revokes "orphan" pre-certificates (i.e. those that do not result in certificates ). We enabled that job (which by default is inactive) because, due to how our OCSP services work, we need orphaned pre-certificates to be revoked in order to give a correct OCSP response in all cases, in compliance with the BR. Please note that we use our own OCSP responder, and not the one built into the EJBCA software.

From our investigations following the March 23rd report, we found that background service first did its job on March 1st when it revoked the first of 8 pre-certificates impacted by this incident. More generally, all 8 pre-certificates under discussion were revoked on the same day as their generation date, so the first one was revoked on March 1st and the last one was revoked on March 21st. Unfortunately, due to the already mentioned bug, those pre-certificates were revoked with reasonCode "certificateHold". We became aware of this issue on March 27 following the report. We didn't notice this earlier as we don't have a system that automatically checks our CRLs for compliance. That being said, I would say that the incident did in fact take place on March 1, 2023.

As for the corrective measures we intend to take to prevent the problem from recurring:

  • EJBCA software update (alrady done on March 29)
  • Implementation of an automation to verify the compliance of our CRLs and our OCSP responses (by end of May)
  • Improvement of our change management processes especially regarding the analysis of the release notes (this is being evaluated).

Hi Adriano,

We enabled that job (which by default is inactive) because, due to how our OCSP services work, we need orphaned pre-certificates to be revoked in order to give a correct OCSP response in all cases, in compliance with the BR

Can you please explain which part of the BRs you refer to? AFAIK there is no requirement to revoke an "orphaned" pre-certificate, and doing so would only increase the size of your CRL without much benefit. Obviously, we don't want an OCSP responder to respond with "Unknown" since it is assumed that the certificate exists, but wouldn't it make more sense for the OCSP responder to respond with "Good" instead of "Revoked"?

This bug has a lot of similarities with bug https://bugzilla.mozilla.org/show_bug.cgi?id=1824257.

Dimitris, I was referring to BR §4.9.10, but also to the MRSP requirements (5.4 Precertificates).

Dimitris, please note that I have not said that we revoke "orphaned" pre-certificates bacause it's required by the BR (we know that it's not).

Thanks for the clarification, I initially read the "in compliance with the BR" as a statement that it was somehow required to be revoked, but we both agree that there is no such requirement.

(In reply to Adriano Santoni from comment #5)

As for the corrective measures we intend to take to prevent the problem from recurring:

  • EJBCA software update (alrady done on March 29)
  • Implementation of an automation to verify the compliance of our CRLs and our OCSP responses (by end of May)
  • Improvement of our change management processes especially regarding the analysis of the release notes (this is being evaluated).

What are the status of these resolution steps? CAs are expected to provide updates at least every week giving your progress as documented at https://www.ccadb.org/cas/incident-report

Hi Mathew,

I apologize for not providing updates at least every week. I confirm that the three corrective measures have all been implemented:

  • The EJBCA software update was done on March 29th as mentioned in comment #5;
  • An automation to check compliance of our CRLs and OCSP responses was successfully developed, tested and implemented between April and May and has been in production ever since; we have developed our own solution as the third-party linters we have considered (e.g. zlint) are not yet usable, to date, for this purpose and/or cannot be easily integrated with our CA;
  • as regards our change management processes especially regarding the analysis of the release notes, we have set up an automatic monitoring of the publication of the release notes (so as not to rely solely on the announcement emails from the vendor) and a 4-eyes examination of the same, also establishing that in the event of doubts about the criticality of some fixes, we will carry out an in-depth investigation with the vendor.

We have no further updates.

I will close this on or about Wed. 19-Jul-2023 unless further discussion is required.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.