Closed Bug 1752636 Opened 2 years ago Closed 2 years ago

SSL.com: Delayed revocation of 53 certificates affected by bug #1750631

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: support, Assigned: support)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Steps to reproduce:

This is a preliminary incident report. Our investigation into this matter is ongoing.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

The issue was discovered when checking revocation actions for Bug #1750631.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2021-12-30T22:56+00:00 A potential security event is detected and registered internally for the issuance of 4 TLS certificates based on validation methods which were prohibited by SC-45. 2 of them are found to be active (non-expired non-revoked).

2022-01-04T16:03+00:00 Revocation of the 2 affected active certificates.

2022-01-17T08:29+00:00 Completion of the internal investigation, with 657 certificates confirmed to be affected, of which 409 are found to be active (target population set to be revoked). Revocation planned to start on 2022-01-21Τ16:00+00:00 and be completed before 2022-01-21T18:00+00:00.

2022-01-21Τ16:00+00:00 The RA Administrators report revocation of the target population has been initiated according to plan.

2022-01-21T17:36+00:00 The RA Administrators report revocation of the target population has been completed according to plan.

2022-01-25T07:27+00:00 A follow-up check on the target population reveals that 53 active certificates were not revoked due to a failure of the bulk revocation script.

2022-01-25T+17:52+00:00 Per discussions with our internal Security Auditors, the CA Administrators complete revocation of the pending 53 active certificates.

2022-01-26T+11:31+00:00 A follow-up check confirms revocation of all affected active certificates.

2022-01-26 Analysis of the revocation problem with the participation of RA Admins, CA Admins and Security Auditors.

2022-01-26 Draft of initial Bugzilla report initiated.

2022-01-26T21:13+00:00 Update of bug #1750631, informing about the delayed revocation of the 53 certificates and our intention to file a separate bug.

2022-01-28 Filed initial Bugzilla report (this document).

  1. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

The certificates in question were intended to be included in a bulk revocation per Bug #1750631. We moved to revoke these certificates immediately upon discovery of this issue, and all were confirmed as revoked (see timeline).

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

Fifty-three (53) certificates, issued between 2021-09-01 and 2022-01-05, were not revoked within the required timeline, as defined in section “4.9.1.1 Reasons for Revoking a Subscriber Certificate” of our CP/CPS.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

Serial numbers of the affected certificates and their crt.sh IDs (pre-certificates confirmed to be logged to CT):

0ca69a5ecbf3ead9bf942e5470b556fd (https://crt.sh/?id=5144471968)
06f59bb38ef7878455e67939cb85facb (https://crt.sh/?id=5145419976)
039a2817e0c8e7654f0a440d33229109 (https://crt.sh/?id=5145898000)
065b9270a86670e17217335e0e12d302 (https://crt.sh/?id=5146497443)
047744680f23edfc912d57dbd1b8dccf (https://crt.sh/?id=5146691821)
0376c862764763a1dab17d7b76eff14b (https://crt.sh/?id=5147592580)
545d05921ac19925c8aa1fc90f11dc8e (https://crt.sh/?id=5157101311)
0a3dbc89653fcd1920d7b1158717c88b (https://crt.sh/?id=5158429164)
03a102954a7490a26d4badae42fab21d (https://crt.sh/?id=5181609491)
0dadb6bdc70cbd5d23e932b87828dc8e (https://crt.sh/?id=5181629041)
0ba1b861ce11843db49f5182822f0b2d (https://crt.sh/?id=5225189778)
064fba4dda063b1dfc2c5c90919e7068 (https://crt.sh/?id=5225840916)
0f4af8e3c746d08f06b40dbd9eaa1ee0 (https://crt.sh/?id=5260687598)
02494d186fafe553151bf2543b198caf (https://crt.sh/?id=5270608916)
03e9d68dad6e0069ca2efc7e53c4c07f (https://crt.sh/?id=5274511274)
054eba3e0270d98a02a8297dbe74ec91 (https://crt.sh/?id=5289236483)
0b7e11ebcafbe4850613987e6c0b5495 (https://crt.sh/?id=5315615117)
0a6d3b6a3b644cbd81fcc95b63863365 (https://crt.sh/?id=5343783688)
01853aa4bc4a456d13784313eb555d2e (https://crt.sh/?id=5353480741)
07cf614c444efc429bc61cf0f66a34bd (https://crt.sh/?id=5392916102)
076a620c8e57c193b2c3bca50bf872e0 (https://crt.sh/?id=5481123725)
0a03459c6bda89e501f23312ac0aae33 (https://crt.sh/?id=5493561185)
02c800e7ff3e78f81f8c225e26c1c592 (https://crt.sh/?id=5506391688)
08c1aad24c6b0f12ea3c6ae753fa5529 (https://crt.sh/?id=5517401632)
025f16e6c5f577d57c915ef3044702da (https://crt.sh/?id=5523898954)
0960342ac9f912b83148ab3453c70b3c (https://crt.sh/?id=5537847773)
082fab8ebe58d83c9c44ef1dc0318cb5 (https://crt.sh/?id=5567525342)
0303ed64a89521f616b17e9dfe680337 (https://crt.sh/?id=5568301699)
0f2975e9e9ff608d8b803f7816ed63fe (https://crt.sh/?id=5614556801)
07e607db929c9cef09e58846465b137e (https://crt.sh/?id=5614644978)
02edaf44f83743a46808ca30bef41c7c (https://crt.sh/?id=5614975857)
0efa81d80914d9c252a6f391e8964e54 (https://crt.sh/?id=5619338947)
05c2ddfc4c113185c1b90f453dea44a9 (https://crt.sh/?id=5619764487)
0592c892da681da0d610803f9ec44c90 (https://crt.sh/?id=5664456154)
0e2b35ef59bcdd9dc52bd35a5823024a (https://crt.sh/?id=5664606333)
0205dca3d9c03f9fdc0c5aba9f52b607 (https://crt.sh/?id=5665497245)
045e2c1cfba56fde142921bb8884414c (https://crt.sh/?id=5678103204)
05efc948f8581054103008176b030369 (https://crt.sh/?id=5678420774)
0e872f6e9884fa06b461fde09ef9d0d8 (https://crt.sh/?id=5701954492)
030b74b4ae4d5773a03f1112c12a3b6d (https://crt.sh/?id=5708663139)
013f81d0594373708383ad3963f63cd8 (https://crt.sh/?id=5726364506)
0a09fef71f222581b177645020874b7f (https://crt.sh/?id=5729904633)
0af8efe5ab5009cc61b3bdbf0fa3f652 (https://crt.sh/?id=5745435444)
070b2dd03ceaf07491c7d908822867ae (https://crt.sh/?id=5755132169)
045b28cd88180aa16e44fbb37945c299 (https://crt.sh/?id=5791552352)
0c61c694184d718acf635def08413efd (https://crt.sh/?id=5793600616)
093275058e735903ac537d8a1e214d3c (https://crt.sh/?id=5800760376)
0e67968a0ea61e0d77e32b0950693340 (https://crt.sh/?id=5813474482)
07f1554d49bbc37dfb65e448c6375724 (https://crt.sh/?id=5821852381)
095cfcb3422baae2b0da4cb2cf25738e (https://crt.sh/?id=5838950893)
09ea187539b2457bd33f7db5afacdda2 (https://crt.sh/?id=5888327816)
0181a2a67e8c615dc3c33f3c5807370d (https://crt.sh/?id=5917434129)
04ec9ba0db134faa041a21d80fa1aa10 (https://crt.sh/?id=5919527166)

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

This issue involved errors in handling the revocation of certificates which were included in the target population per Bug #1750631.

The problem was caused due to a formatting mistake in the S/N of 53 certificates of the target population. Specifically, during data collection and pre-processing, the leading zeroes of these certificate S/Ns were removed. This resulted in those certificates being skipped over during the mass revocation process without logging any errors or warnings.

The issue was not detected immediately after the bulk revocation of the target population due to the sampling nature of the checks run at that time, and which did not reveal any problematic revocation. Our follow-up checks included automated scanning of the entire target population to verify certificate status, which was able to detect the issue and the exact population of certificates which were not revoked.

  1. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Immediate actions are described in steps 2-5 of this report and include the immediate involvement of our internal auditing department and revocation of the affected certificates.

Our investigation is ongoing to reveal more details on this issue and its source. Per our Incident Management Policy, an analysis shall also take place to identify any underlying weaknesses and according to the results, decide the proper measures and improvements in our systems and processes, so that such occurrences are not repeated in the future.

A full incident report shall be filed here when our investigation is complete. In the meantime, we will post regular updates here.

Assignee: bwilson → support
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [[ca-compliance] [delayed-revocation-leaf]
Whiteboard: [[ca-compliance] [delayed-revocation-leaf] → [ca-compliance] [delayed-revocation-leaf]

This is an update to note our progress.

Our analysis has revealed that the source of the issue was the different formatting which is used by the RA and CA software components of our PKI system when storing the certificate S/N in their respective databases. During the pre-processing of the certificate population affected by Bug #1750631, the certificate S/Ns were re-formatted to their HEX arithmetic values instead of their string representations. This affected the sub-population of certificates with a S/N starting with one or more zeros (53 of the original 657 certificates affected by Bug #1750631).

The issue was not discovered immediately upon revocation, because the bulk revocation script handled any S/N not found (such as the ones with the trimmed zeros) as already revoked (and thus reported no error). Also, none of the 53 affected certificates were included in the sample which was examined immediately after the bulk revocation was performed.

A follow-up check was conducted on the next business day to provide independent confirmation of revocation of the target population. This check resulted in the discovery of the issue and triggered further actions: the internal auditing team was immediately notified and the remaining 53 certificates were revoked within the same day. However, these revocations did not meet the applicable timeline requirements, thus resulting in this incident.

Our analysis indicates that the main underlying cause of this incident was the lack of immediate independent verification of all revoked items. Given the fact that transformations, bulk processing, merging and similar pre-processing tasks are generally prone to errors, the bulk revocation process should always (regardless of the tools used) conclude with an independent and automated verification step for all revoked items.

Our plan is to address the above issue by adding this independent and automated verification step in the bulk revocation procedure. Furthermore, the procedure will specify in more detail the modalities (tools, scripts, etc) to be followed for bulk revocations and the verification of these revocations.

The timeline for the above action plan shall be part of the final report for this issue, which we intend to file within the next week.

Our investigation into this matter has been completed, and this is our final report regarding this incident.

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

The issue was discovered when checking revocation actions for Bug #1750631.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2021-12-30T22:56+00:00 A potential security event is detected and registered internally for the issuance of 4 TLS certificates based on validation methods which were prohibited by SC-45. 2 of them are found to be active (non-expired non-revoked).
2022-01-04T16:03+00:00 Revocation of the 2 affected active certificates.
2022-01-17T08:29+00:00 Completion of the internal investigation, with 657 certificates confirmed to be affected, of which 409 are found to be active (target population set to be revoked). Revocation planned to start on 2022-01-21Τ16:00+00:00 and be completed before 2022-01-21T18:00+00:00.
2022-01-21Τ16:00+00:00 The RA Administrators report revocation of the target population has been initiated according to plan.
2022-01-21T17:36+00:00 The RA Administrators report revocation of the target population has been completed according to plan.
2022-01-25T07:27+00:00 A follow-up check on the target population reveals that 53 active certificates were not revoked due to a failure of the bulk revocation script.
2022-01-25T+17:52+00:00 Per discussions with our internal Security Auditors, the CA Administrators complete revocation of the pending 53 active certificates.
2022-01-26T+11:31+00:00 A follow-up check confirms revocation of all affected active certificates.
2022-01-26 Analysis of the revocation problem with the participation of RA Admins, CA Admins and Security Auditors.
2022-01-26 Draft of initial Bugzilla report initiated.
2022-01-26T21:13+00:00 Update of bug #1750631, informing about the delayed revocation of the 53 certificates and our intention to file a separate bug.
2022-01-28T22:29+00:00 Filed initial Bugzilla report.
2022-02-08T18:00+00:00 Internal meeting with the participation of the internal auditing team and all personnel involved in the revocation process to identify and review the timeline of events that led to this issue. During the meeting, a preliminary assessment of the possible underlying causes and suggested remediation measures is made.
2022-02-14 to 2022-02-25 Ongoing investigation by our internal auditors, analysis and documentation of the events, the sources of the issue and the underlying causes, as required by our Incident Management Policy. The analysis concludes with the suggested remediation measures.
2022-03-02T19:16+00:00 Internal meeting to discuss and finalize the results of the postmortem analysis and the suggested remediation measures, so that such occurrences are not repeated in the future.
2022-03-02T23:26+00:00 Update of the public bug to report our progress.
2021-03-03T17:00+00:00 Start drafting the final Bugzilla report (this document)
2022-03-03 to 2022-03-11 Finalization of the postmortem analysis, the remediation measures and the timeline.
2022-03-11 Filed final Bugzilla report (this document).

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

The certificates in question were intended to be included in a bulk revocation per Bug #1750631. We moved to revoke these certificates immediately upon discovery of this issue, and all were confirmed as revoked (see timeline).

4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

Fifty-three (53) certificates, issued between 2021-09-01 and 2022-01-05, were not revoked within the required timeline, as defined in section “4.9.1.1 Reasons for Revoking a Subscriber Certificate” of our CP/CPS.

5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

Serial numbers of the affected certificates and their crt.sh IDs (pre-certificates confirmed to be logged to CT):

0ca69a5ecbf3ead9bf942e5470b556fd (https://crt.sh/?id=5144471968)
06f59bb38ef7878455e67939cb85facb (https://crt.sh/?id=5145419976)
039a2817e0c8e7654f0a440d33229109 (https://crt.sh/?id=5145898000)
065b9270a86670e17217335e0e12d302 (https://crt.sh/?id=5146497443)
047744680f23edfc912d57dbd1b8dccf (https://crt.sh/?id=5146691821)
0376c862764763a1dab17d7b76eff14b (https://crt.sh/?id=5147592580)
545d05921ac19925c8aa1fc90f11dc8e (https://crt.sh/?id=5157101311)
0a3dbc89653fcd1920d7b1158717c88b (https://crt.sh/?id=5158429164)
03a102954a7490a26d4badae42fab21d (https://crt.sh/?id=5181609491)
0dadb6bdc70cbd5d23e932b87828dc8e (https://crt.sh/?id=5181629041)
0ba1b861ce11843db49f5182822f0b2d (https://crt.sh/?id=5225189778)
064fba4dda063b1dfc2c5c90919e7068 (https://crt.sh/?id=5225840916)
0f4af8e3c746d08f06b40dbd9eaa1ee0 (https://crt.sh/?id=5260687598)
02494d186fafe553151bf2543b198caf (https://crt.sh/?id=5270608916)
03e9d68dad6e0069ca2efc7e53c4c07f (https://crt.sh/?id=5274511274)
054eba3e0270d98a02a8297dbe74ec91 (https://crt.sh/?id=5289236483)
0b7e11ebcafbe4850613987e6c0b5495 (https://crt.sh/?id=5315615117)
0a6d3b6a3b644cbd81fcc95b63863365 (https://crt.sh/?id=5343783688)
01853aa4bc4a456d13784313eb555d2e (https://crt.sh/?id=5353480741)
07cf614c444efc429bc61cf0f66a34bd (https://crt.sh/?id=5392916102)
076a620c8e57c193b2c3bca50bf872e0 (https://crt.sh/?id=5481123725)
0a03459c6bda89e501f23312ac0aae33 (https://crt.sh/?id=5493561185)
02c800e7ff3e78f81f8c225e26c1c592 (https://crt.sh/?id=5506391688)
08c1aad24c6b0f12ea3c6ae753fa5529 (https://crt.sh/?id=5517401632)
025f16e6c5f577d57c915ef3044702da (https://crt.sh/?id=5523898954)
0960342ac9f912b83148ab3453c70b3c (https://crt.sh/?id=5537847773)
082fab8ebe58d83c9c44ef1dc0318cb5 (https://crt.sh/?id=5567525342)
0303ed64a89521f616b17e9dfe680337 (https://crt.sh/?id=5568301699)
0f2975e9e9ff608d8b803f7816ed63fe (https://crt.sh/?id=5614556801)
07e607db929c9cef09e58846465b137e (https://crt.sh/?id=5614644978)
02edaf44f83743a46808ca30bef41c7c (https://crt.sh/?id=5614975857)
0efa81d80914d9c252a6f391e8964e54 (https://crt.sh/?id=5619338947)
05c2ddfc4c113185c1b90f453dea44a9 (https://crt.sh/?id=5619764487)
0592c892da681da0d610803f9ec44c90 (https://crt.sh/?id=5664456154)
0e2b35ef59bcdd9dc52bd35a5823024a (https://crt.sh/?id=5664606333)
0205dca3d9c03f9fdc0c5aba9f52b607 (https://crt.sh/?id=5665497245)
045e2c1cfba56fde142921bb8884414c (https://crt.sh/?id=5678103204)
05efc948f8581054103008176b030369 (https://crt.sh/?id=5678420774)
0e872f6e9884fa06b461fde09ef9d0d8 (https://crt.sh/?id=5701954492)
030b74b4ae4d5773a03f1112c12a3b6d (https://crt.sh/?id=5708663139)
013f81d0594373708383ad3963f63cd8 (https://crt.sh/?id=5726364506)
0a09fef71f222581b177645020874b7f (https://crt.sh/?id=5729904633)
0af8efe5ab5009cc61b3bdbf0fa3f652 (https://crt.sh/?id=5745435444)
070b2dd03ceaf07491c7d908822867ae (https://crt.sh/?id=5755132169)
045b28cd88180aa16e44fbb37945c299 (https://crt.sh/?id=5791552352)
0c61c694184d718acf635def08413efd (https://crt.sh/?id=5793600616)
093275058e735903ac537d8a1e214d3c (https://crt.sh/?id=5800760376)
0e67968a0ea61e0d77e32b0950693340 (https://crt.sh/?id=5813474482)
07f1554d49bbc37dfb65e448c6375724 (https://crt.sh/?id=5821852381)
095cfcb3422baae2b0da4cb2cf25738e (https://crt.sh/?id=5838950893)
09ea187539b2457bd33f7db5afacdda2 (https://crt.sh/?id=5888327816)
0181a2a67e8c615dc3c33f3c5807370d (https://crt.sh/?id=5917434129)
04ec9ba0db134faa041a21d80fa1aa10 (https://crt.sh/?id=5919527166)

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

This issue involved errors in handling the revocation of certificates which were included in the target population per Bug #1750631.

Our analysis has revealed that the source of the issue was the different formatting which is used by the RA and CA software components of our PKI system when storing the certificate S/N in their respective databases. During the pre-processing of the certificate population affected by Bug #1750631, the certificate S/Ns were re-formatted to their HEX arithmetic values instead of their string representations. This affected the sub-population of certificates with a S/N starting with one or more zeros (53 of the original 657 certificates affected by Bug #1750631).

The issue was not discovered immediately upon revocation, because the bulk revocation script handled any S/N not found (such as the ones with the trimmed zeros) as already revoked (and thus reported no error). Also, none of the 53 affected certificates were included in the sample which was examined immediately after the bulk revocation was performed.

A follow-up check was conducted on the next business day to provide independent confirmation of revocation of the target population. This check resulted in the discovery of the issue and triggered further actions: the internal auditing team was immediately notified and the remaining 53 certificates were revoked within the same day. However, these revocations did not meet the applicable timeline requirements, thus resulting in this incident.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Immediate actions are described in steps 2-5 of this report and include the immediate involvement of our internal auditing department and revocation of the affected certificates within the same day. Follow-up actions included the documentation of the events that led to the issue, the analysis of its sources, the discovery of underlying causes and the identification of remediation measures, as required by our Incident Management Policy.

Our analysis (summarized in section 6 of this report) indicates that the main underlying cause of this incident was the lack of immediate independent verification of all revoked items. Given the fact that transformations, bulk processing, merging and similar pre-processing tasks are generally prone to errors, the bulk revocation process should always (regardless of the tools used) conclude with an independent and automated verification step for all revoked items.

Our plan is to address the above issue by adding this independent and automated verification step in the bulk revocation procedure. Furthermore, the procedure will specify in more detail the modalities (tools, scripts, etc.) to be followed for bulk revocations and the verification of these revocations.

The procedure and the relevant tools shall be finalized and rolled out before the end of April 2022.

This is an update to report current progress on outstanding items for this issue:

An update to the Bulk Revocation Procedure has been drafted which:

  • Includes an independent, automated verification step, and
  • Refers to specific approved modalities (tools, scripts, etc.) for the execution and verification steps of bulk revocations.

The relevant tools have been prepared by our engineers at the RA and CA level. These tools are currently passing through testing and compliance review before they will be considered approved to execute and verify bulk revocations.

We intend to perform final review and approval of both the Bulk Revocation Procedure and the tools developed to support it as a unified step. This is an additional quality check we have added to ensure the effectiveness and reliability of the entire bulk revocation process, and to confirm the intended interaction between the Bulk Revocation Procedure and these tools.

Our plan is to complete the above actions within the next two weeks.

This is an update to report current progress on outstanding items for this issue.

Since our previous update, we have completed the testing and review of the scripts developed for the verification of bulk revocations. We are now testing / reviewing the respective tools developed for the execution of bulk revocations and updating our documentation accordingly.

We will update this bug next week based on our progress.

This is an update to report current progress on outstanding items for this issue.

Since our previous update, we have completed the testing of the tools developed for the execution of bulk revocations and updated our documentation accordingly. With the two main components of the process ready for use, our next step is to complete end-to-end testing before signing off on this procedure.

We will update this bug next week based on our progress.

This is an update to report our progress on outstanding items for this issue.

End-to-end testing and review of the process has been successfully completed. With the completion of all quality checks, the documented Bulk Revocation Procedure and the relevant tools have been finalized.

This concludes our remediation actions for this bug.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] [delayed-revocation-leaf] → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.