Open Bug 1943379 Opened 16 days ago Updated 3 days ago

Actalis: CRL with duplicate serial number in revokedCertificates

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: marco.menonna, Assigned: marco.menonna)

Details

(Whiteboard: [ca-compliance] [crl-failure])

Attachments

(1 file)

On January 21 at 21:47 (CEST), we received a CPR (certificate problem report) notifying us of a problem in one of our CRLs.
In particular, one of our CRLs presented a duplicate entry (same serial number).
We acknowledge the issue and confirm that we started investigation.
We will post a preliminary incident report soon.

Assignee: nobody → marco.menonna
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [crl-failure]

Summary

On the evening of 01/21/2025, Actalis received a certificate problem report informing us that "the CRL issued by Actalis Domain Validation Server CA G3 contains two entries for certificate serial #5A6E9BB44E7AE6EB95A07E20A37B6D07".

In response to such CPR, on Jan 22 morning, Actalis acknowledged receipt and started investigating the issue.

Our CRLs are generated by the EJBCA software based on a table in its database that contains all certificates and their status.

Knowing this, we examined the said table and, as expected, found two rows with the same serial number (5A6E9BB44E7AE6EB95A07E20A37B6D07), same issuer (Actalis Domain Validation Server CA G3), and revoked status, which explains how the anomalous CRL originated. We found that one of the two was the pre-certificate, normally absent once the final certificate has been issued. We fixed the problem by eliminating the spurious row (the pre-certificate) from the database. Then, by forcing the regeneration of the CRL, we found that the duplication had disappeared. We carefully examined all our other CRLs without finding similar problems.

In the meantime, we investigated the possible causes of the problem. We found evidence in the logs that on 9/30/2024 the EJBCA software incurred an internal error of a type that we've never seen before. We have asked for explanations and possibly a fix to the EJBCA software vendor and are waiting for a response.

We are setting up a monitoring system to detect any recurrence of the problem.

A full incident report will be posted by next week.

Can you clarify whether the two entries had the same revocationDate, reasonCode, and other extensions?

We confirm that the two entries differed only in their revocationDate. Further details will be available in the incident report.

Preliminary Incident Report

Summary

On Jan 21, 2025, Actalis was sent a Certificate Problem Report informing us that "the CRL issued by Actalis Domain Validation Server CA G3 contains two entries for certificate serial #5A6E9BB44E7AE6EB95A07E20A37B6D07". This violates section 5.1.2.6 of RFC5280 which prescribes that "Certificates revoked by the CA are uniquely identified by the certificate serial number".

Impact

Only the CRLs issued by one specific subordinate CA were affected, and in that CRL only one specific entry was found to be duplicated.

According to our findings, the duplicated entry was present in the said CRL (which is published at http://crl06.actalis.it/Repository/AUTHDV-G3/getLastCRL) from 9/30/2024 onwards, until 1/22/2025 when we fixed it.

There were no impacts on certificates.

Timeline

All times are CET.

2024-09-20:

  • 11:35 An SSL DV certificate was requested, was assigned serial 5A6E9BB44E7AE6EB95A07E20A37B6D07, and was issued by Actalis Domain Validation Server CA G3

2024-09-30:

  • 11:35 The above certificate was requested to be revoked. At that time, EJBCA found an anomalous situation in its database. As a result, EJBCA inserted two entries in the CRL with the same serial number above. From this point on, the CRL presented this duplication, which was later removed (see below).

2025-01-21:

  • 21:46 Actalis was sent a CPR notifying us that the CRL issued by "Actalis Domain Validation Server CA G3" contained two entries with the same serialNumber (5A6E9BB44E7AE6EB95A07E20A37B6D07) - see the attached CRL #6583.

2025-01-22:

  • 08:45 The group of people responsible for monitoring CPRs received the above-mentioned report, read it, and alerted the various internal stakeholders.
  • 14:56 We acknowledged receipt of the CPR.
  • 15:21 We found evidence of a likely malfunction in the EJBCA software, but not in other software components and processes that the our CA relies on.
  • 15:43 We decided to fix the specific problem by removing the unwanted row in the CA certificate DB table, then forcing regeneration of the CRL. From this point on, the CRL did not presented the said duplication anymore. We also decided to plan to set up a monitoring system to detect the recurrence of the problem.

2025-01-23:

  • 10:13 We opened a support ticket to the EJBCA software vendor asking for explanations and a fix.
  • 17:26 We filed this incident on Bugzilla.

Root Cause Analysis

The serial number in the duplicate CRL entry is that of a TLS DV certificate that was requested and issued on September 20, 2024, but was revoked a few days later (September 30).

On Jan 22, after reading the CPR, we analyzed the logs of the various subsystems that make up our CA infrastructure, but we found no evidence of anomalies anywhere except within the EJBCA software, where we found traces of an internal error with no apparent cause. In particular, we found evidence that on 30/9/2024, when the revocation of that certificate was requested, the EJBCA software encountered the anomalous presence, in its certificate table, of two rows with the same issuer and serial number: one with the pre-certificate data and one with the final certificate data. This was evidently anomalous (since, normally, once a certificate has been issued, the pre-certificate is replaced with the final certificate within the database). Furthermore, these two entries in the database with the same (issuer,serial) were both found to be in the revoked state, and this was apparently the immediate reason why from 30/09/2024 onwards the said CRL began to contain two entries with the same serial number.

On the same Jan 22, we fixed the specific problem of that CRL entry by eliminating the spurious row (the pre-certificate) from the database. Then, by forcing the regeneration of the CRL, we found that the duplication had disappeared. We carefully examined all our other CRLs without finding similar problems.

Subsequently we opened a support ticket to the EJBCA software vendor asking for explanations and possibly a preventive fix.

In the meantime, we decided to set up a specific monitoring of our CRLs to detect any duplication of serial numbers and, if any is found, alert a group of people.

Lessons Learned

TBD

Action Items

Action Item Kind Due Date
Set up a monitoring system to detect and report the occurrence of duplicate serial numbers in our CRLs. Detect 2025-02-28
Apply to the EJBCA software the fix (TBD) suggested by its vendor, to prevent recurrance of the problem. Prevent TBD

Appendix

Details of affected certificates

Not applicable.

What caused the preliminary incident reported to be filed so late? Please note that as per CCADB incident report guidelines:

An initial report should be filed within 72 hours of the CA Owner being made aware of the incident. If a full incident report is not yet ready, CA Owners should provide a preliminary report containing an executive summary of the incident and a date by which the full report will be posted. The full incident report must be posted within two weeks of the incident.

Which would give a start time of 01-21 for the CPR, 01-24 for the preliminary report, and 02-04 for the full report.

Thank you for this report! I agree that having two entries on a CRL with the same serial number but different metadata (in this case a different revocationDate, but in a worse case it could be a different reasonCode) is problematic: how is a relying party to know which entry is correct?

However, I do not think that this is actually a violation of RFC 5280. As stated at the top of this report, RFC 5280, Section 5.1.2.6 says (emphasis added):

When there are no revoked certificates, the revoked certificates list
MUST be absent. Otherwise, revoked certificates are listed by their
serial numbers. Certificates revoked by the CA are uniquely
identified by the certificate serial number.
The date on which the
revocation occurred is specified. The time for revocationDate MUST
be expressed as described in Section 5.1.2.4.

First, this is not normative language within the RFC. It is preceded by a normative sentence and followed by a normative sentence, but is not itself placing a normative requirement on the contents of a CRL, as evidenced by the lack of any RFC2119 keywords.

Second, the plain English reading of this sentence is that certificates are uniquely identified by serial numbers, which is true. It is not saying that CRL entries are uniquely identified by serial numbers. The purpose of this sentence is to remind the reader that only a serial number is required to uniquely identify the certificate corresponding to the CRL entry, because serial numbers are already required to be unique among all certificates issued by a given issuer.

So my conclusion is that, while having two CRL entries with the same serial number and different metadata is certainly a problem and probably should be explicitly forbidden, the language quoted at the top of this report does not actually forbid this situation.

What caused the preliminary incident reported to be filed so late? Please note that as per CCADB incident report guidelines:

An initial report should be filed within 72 hours of the CA Owner being made aware of the incident. If a full incident report is not yet ready, CA Owners should provide a preliminary report containing an executive summary of the incident and a date by which the full report will be posted. The full incident report must be posted within two weeks of the incident.

We considered the content of the post dated 01-24 as a preliminary report, as it already contained the basic elements, including an executive summary of the incident and an expected timeline for the full report. Of course, if further clarification is needed, we can provide additional details.

Incident Report

Summary

On Jan 21, 2025, Actalis was sent a Certificate Problem Report informing us that "the CRL issued by Actalis Domain Validation Server CA G3 contains two entries for certificate serial #5A6E9BB44E7AE6EB95A07E20A37B6D07". This violates section 5.1.2.6 of RFC5280 which prescribes that "Certificates revoked by the CA are uniquely identified by the certificate serial number".

Impact

Only the CRLs issued by one specific subordinate CA were affected, and in that CRL only one specific entry was found to be duplicated.

According to our findings, the duplicated entry was present in the said CRL (which is published at http://crl06.actalis.it/Repository/AUTHDV-G3/getLastCRL) from 9/30/2024 onwards, until 1/22/2025 when we fixed it.

There were no impacts on certificates.

Timeline

All times are CET.

2024-09-20:

  • 11:35 An SSL DV certificate was requested, was assigned serial 5A6E9BB44E7AE6EB95A07E20A37B6D07, and was issued by Actalis Domain Validation Server CA G3

2024-09-30:

  • 11:35 The above certificate was requested to be revoked. At that time, EJBCA found an anomalous situation in its database. As a result, EJBCA inserted two entries in the CRL with the same serial number above. From this point on, the CRL presented this duplication, which was later removed (see below).

2025-01-21:

  • 21:46 Actalis was sent a CPR notifying us that the CRL issued by "Actalis Domain Validation Server CA G3" contained two entries with the same serialNumber (5A6E9BB44E7AE6EB95A07E20A37B6D07) - see the attached CRL #6583.

2025-01-22:

  • 08:45 The group of people responsible for monitoring CPRs received the above-mentioned report, read it, and alerted the various internal stakeholders.
  • 14:56 We acknowledged receipt of the CPR.
  • 15:21 We found evidence of a likely malfunction in the EJBCA software, but not in other software components and processes that the our CA relies on.
  • 15:43 We decided to fix the specific problem by removing the unwanted row in the CA certificate DB table, then forcing regeneration of the CRL. From this point on, the CRL did not presented the said duplication anymore. We also decided to plan to set up a monitoring system to detect the recurrence of the problem.

2025-01-23:

  • 10:13 We opened a support ticket to the EJBCA software vendor asking for explanations and a fix.
  • 17:26 We filed this incident on Bugzilla.

2025-01-24:

  • 18:16 The EJBCA vendor recommended us to create a new index in to CertificateData Table as a temporary workaround

2025-01-27

  • 09:25 We asked our Operations Department to set up a specific monitoring of all our CRLs to detect any duplication of serial numbers

2025-01-30:

  • 12:41 The EJBCA vendor recognized the existence of a bug they will fix in a next release of the EJBCA software

2025-02-03:

  • 09:38 We asked our Database Administrator Department to create the workaround suggested by our vendor

Root Cause Analysis

The serial number in the duplicate CRL entry is that of a TLS DV certificate that was requested and issued on September 20, 2024, but was revoked a few days later (September 30).

On Jan 22, after reading the CPR, we analyzed the logs of the various subsystems that make up our CA infrastructure, but we found no evidence of anomalies anywhere except within the EJBCA software, where we found traces of an internal error with no apparent cause. In particular, we found evidence that on 30/9/2024, when the revocation of that certificate was requested, the EJBCA software encountered the anomalous presence, in its certificate table, of two rows with the same issuer and serial number: one with the pre-certificate data and one with the final certificate data. This was evidently anomalous (since, normally, once a certificate has been issued, the pre-certificate is replaced with the final certificate within the database). Furthermore, these two entries in the database with the same (issuer,serial) were both found to be in the revoked state, and this was apparently the immediate reason why from 30/09/2024 onwards the said CRL began to contain two entries with the same serial number.

On the same Jan 22, we fixed the specific problem of that CRL entry by eliminating the spurious row (the pre-certificate) from the database. Then, by forcing the regeneration of the CRL, we found that the duplication had disappeared. We carefully examined all our other CRLs without finding similar problems.

Subsequently we opened a support ticket to the EJBCA software vendor asking for explanations and possibly a preventive fix. The vendor eventually admitted that the problem was caused by a bug, and recommended us to create an additional index in the certificate table as a temporary workaround. This recommendation was discussed at length before being applied, as we had some doubts about its rationale (the additional index is described in the EJBCA documentation as an "optimization"). In the end we decided to apply that workaround as the vendor reiterated that it was the best thing to do while waiting for a real fix, which will come with a future version of the EJBCA.

In the meantime, we decided to set up a specific monitoring of our CRLs to detect any duplication of serial numbers and, if any is found, alert a group of people.

Lessons Learned

This incident teaches us it that is better never to take for granted even the most obvious things, such as the absence of duplicated serial numbers in a CRL that's generated by a well known third party software widely used in the industry.

What went well

  • We have been able to quickly fix the specific problem of the affected CRL.

What didn't go well

  • /

Where we got lucky

  • The problem did not affect any other of our CRLs.
  • The problem didn't affect certificates.

Action Items

Action Item Kind Due Date
Set up a monitoring system to detect and report the occurrence of duplicate serial numbers in our CRLs. Detect 2025-02-03
Apply to the EJBCA software the fix suggested by its vendor, to prevent recurrance of the problem. Prevent 2025-02-12

Appendix

Details of affected certificates

Not applicable.

(In reply to Marco Menonna from comment #8)

What caused the preliminary incident reported to be filed so late? Please note that as per CCADB incident report guidelines:

An initial report should be filed within 72 hours of the CA Owner being made aware of the incident. If a full incident report is not yet ready, CA Owners should provide a preliminary report containing an executive summary of the incident and a date by which the full report will be posted. The full incident report must be posted within two weeks of the incident.

We considered the content of the post dated 01-24 as a preliminary report, as it already contained the basic elements, including an executive summary of the incident and an expected timeline for the full report. Of course, if further clarification is needed, we can provide additional details.

The part that made me not think it was a Preliminary Incident Report was that it was titled:

(In reply to Marco Menonna from comment #1)

Summary

A full incident report will be posted by next week.

Then a week later we received the Preliminary Incident Report:

(In reply to Marco Menonna from comment #5)

Preliminary Incident Report

Notably this isn't the full incident report that was mentioned in the 01-24 Comment, which claimed a date for when the full report would be posted.

There seems to be some minor confusion over what a Preliminary Incident Report contains by comparing the Summary and Preliminary comments. Comment 5 makes it clear that standard preliminary reports are generally understood by Actalis, however it lacks a date for the full report while being clear that the report is preliminary (e.g. Lessons Learned). So this isn't a case of the full report being mislabeled, which has happened for other CAs.

Just for some clarification for this incident when does Actalis consider the clock to start?

  1. 2025-01-21 21:46 (CEST) Actalis was sent a CPR notifying us that the CRL issued by "Actalis Domain Validation Server CA G3" contained two entries with the same serialNumber (5A6E9BB44E7AE6EB95A07E20A37B6D07) - see the attached CRL #6583.
  2. 2025-01-22 08:45 (CEST) The group of people responsible for monitoring CPRs received the above-mentioned report, read it, and alerted the various internal stakeholders.
  3. 2025-01-22 14:56 (CEST) We acknowledged receipt of the CPR.
  4. 2025-01-22 15:21 (CEST) We found evidence of a likely malfunction in the EJBCA software, but not in other software components and processes that the our CA relies on.

There have been different interpretations by other CAs, and ideally the community as a whole can come to some stricter definition to make everyone happy.

Thank you for this report. I do have two questions:

Serial Number: 5A6E9BB44E7AE6EB95A07E20A37B6D07 Revocation Date: Sep 30 09:35:27 2024 GMT

Serial Number: 5A6E9BB44E7AE6EB95A07E20A37B6D07 Revocation Date: Sep 20 10:07:49 2024 GMT

  • The affected CRL shows two different dates for the revocation of this certificate / serial number, as you also confirmed in comment #3. Were you able to find the cause of the discrepancy for these dates?
    • If there was a malfunction, originally triggered by the revocation request on September 30th, I would have expected both entries to still have the same revocation date.
  • I also notice the latest CRL list the later date out of the two distinct dates provided by the earlier CRLs. Was there any particular reason you choose to keep the later date, rather than the earlier date?
    • I ask this because I believe changing the revocation date of a CRL entry to a later point in time, should be considered to be the same as removing a CRL entry and re-adding it again later (i.e. unrevoking a certificate for a certain amount of time).

I share Aaron's opinion on believing this bug might not be an actual violation, though I haven't looked deeply at that case.Yet I'm glad to see Actalis went ahead with an incident report, given that there are several publicly trusted CAs out there utilizing EJBCA.

Flags: needinfo?(marco.menonna)

Martijn, on your two questions (respectively):

  • we would like to know too, but unfortunately we still don't know
  • because that is the date on which the revocation of the certificate was requested

Just for some clarification for this incident when does Actalis consider the clock to start?

We confirm that we consider the clock to start on 2025-01-22 at 08:45 (CET), when the CPR was read and the incident was acknowledged internally, in line with the CCADB incident report guidelines, which indicate that the clock starts when the CA becomes aware of the issue.

Flags: needinfo?(marco.menonna)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: