Closed Bug 1904402 Opened 6 months ago Closed 2 months ago

CommScope: Incomplete Incident Report

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: agwa-bugs, Assigned: nicol.so)

Details

(Whiteboard: [ca-compliance] [policy-failure])

In Bug 1852404 Comment 7, CommScope stated (emphasis added):

Including certificates that had expired or been revoked at the point when we became aware of the issue, a total of 30 certificates with the problem have been issued. The first of the certificate was issued on 2022-07-25, the last 2023-08-14.

However, I have identified 36 certificates with this problem:

https://crt.sh/?sha256=05c6ab8929e2227d78ebb91fc51f521308b89d6f7d696105dffcc53f3c7d149c
https://crt.sh/?sha256=0710ad73a6804085dafb7dc8e43b19f24254aee0bbd50a86472c6a2cd739e3e7
https://crt.sh/?sha256=0bcff18a9c9cd429b83caf737a639a28808039ee85daa463b4ab86926ea8647f
https://crt.sh/?sha256=0e408f87b5cfdc1f1cfc95a0a6271cee577bbf52bc3466d975557aaeab517109
https://crt.sh/?sha256=147438d30201df1759f24453d0a4f74004f78159eb92a2f42df3c33a732810e3
https://crt.sh/?sha256=14d1fc5c43b6ab71e62cc4d232033a0b2db3d2833ab5ec7aaa66aea22f5bd5a4
https://crt.sh/?sha256=23b3688eb98adad99d56e311c677e301c045c8f7ddb2ec6a85ad178102f230b7
https://crt.sh/?sha256=259a2f394d1514af5831bd0993767c73c1ec9bf822ee445fe2668b7a17fb0632
https://crt.sh/?sha256=28c4f324a54915a0ef0fda3707e40e855b14d53acdfa726a515000b8316d4793
https://crt.sh/?sha256=3664d3f9ad5aebb6fe9b3bc73b5b65a950a9908b325f0e4fc4c0fa34741ff178
https://crt.sh/?sha256=38085f60623b3593e9c8dbc27fb8f2cb20b259295863948f17b8a56ea88af00f
https://crt.sh/?sha256=3a8a4ff7f0b6a4bf2d2904510110771fb0767fba8e2087ebe8cb222e1e0675a0
https://crt.sh/?sha256=3c9f6645daab08f7b2ce8abf4382aae2196d5f16637f36a46807515837da5437
https://crt.sh/?sha256=40781a09b834080b72cac38b372c6715cc7e1aa9b00712a3fa9840d618469205
https://crt.sh/?sha256=4f0b747c8a452c78a4f4c21e2cfb5af9ba94106a861a09c50bb7ca47be76d646
https://crt.sh/?sha256=59a51c29f40b04e8c717d3bbeb48020f23e53d46e9060d8b7aab40d3ad5ebb06
https://crt.sh/?sha256=5a6fa588225155f83d9ca382265d0f1e50df34a33a66b67e3ebfb2d7707eefb2
https://crt.sh/?sha256=5b7744444dac0f04f71be489d4dd31f2fe3296b6b380068511f5b3a4f60bf2d7
https://crt.sh/?sha256=6272cb47728081b48ac64c8da659c95ab9acd8b6a243909f8d82f15a6ab9f9fc
https://crt.sh/?sha256=7958567506ca778c11a6d553db50db7d9e89d8f3dc96b52322e2f564f9354fbf
https://crt.sh/?sha256=7f8b732e783fc7e32e5eb7e639f9f1437bd1bbda7276458f72b7c41b06581e5e
https://crt.sh/?sha256=83090185395455675cd133ce408f87fe348dd74b1f93b5ba85788abb92199964
https://crt.sh/?sha256=87c47f512a80e7c2dcf8f004c09dc06b1d259daec6bd6477ada0b16f63bcfcfb
https://crt.sh/?sha256=aadaa66b0fde5863bd8ac8043da39e14c1f8f3adf2fed561d2a00e3172492ba4
https://crt.sh/?sha256=ad1f7daa8dbf5cd8a30cdaf66c3307c41deea91081f45c6ce7bc624b8dacd5b9
https://crt.sh/?sha256=b00a2af08c3ed09ba639c3d5765517838af11fecc27b2ffb93126b37754b67af
https://crt.sh/?sha256=b0f18184d69790b3da328eec3d47c4611e04a2c8854b2566457b68719c58ab4d
https://crt.sh/?sha256=b73a749b14e45cd618358dec02c21cb4622467acdf6daf8ddd7aac93047dce0b
https://crt.sh/?sha256=c3a63c74d6c209958048f46f22960c3ea8678817c4c41b0c85db4b9f5ff7378c
https://crt.sh/?sha256=d083da050cbf9d5556fe64b4d45421b4bde503b25b478847682ffc749657a574
https://crt.sh/?sha256=d2e6b7cc797a175e5c59d64d561d4ebe56d311d5582259e0a5154b50d8e00f0c
https://crt.sh/?sha256=ddf00d12805f1bbc36f84365fb8df58c63df8c71eed1b4da173d19cf55173003
https://crt.sh/?sha256=e44e5d34a68471655e3c65f382aca983b8c32e7110306ef0e7ed25501b39272b
https://crt.sh/?sha256=e5825957944bf85d35562a6b042faee7dbda55a69ed18c6f442474b6257016b3
https://crt.sh/?sha256=ed5df8d326dd6e61cf03ff99809cdd93bbdeb51380492daa44bea07f20e3ae8e
https://crt.sh/?sha256=fda73bcc2d06a2fc5d52f577d420b8459782816753e2b4070cc5cb959f550251

Additionally, that incident report failed to include the complete certificate data.

Flags: needinfo?(nicol.so)

I can confirm this is still true, and I'm not seeing a response from Commscope regarding this Certificate Problem Report over 4 days in?
https://crt.sh/?sha256=0710ad73a6804085dafb7dc8e43b19f24254aee0bbd50a86472c6a2cd739e3e7

$ zlint 13504276836.crt
"e_empty_sct_list": {
"result": "error",
"details": "At least one SCT MUST be included in the SignedCertificateTimestampList extension"
},

$ lint_cabf_serverauth_cert lint -s ERROR -d 13504276836.crt
SctListElementCountValidator @ certificate.tbsCertificate.extensions.7.extnValue.signedCertificateTimestampList
pkix.sct_list_empty (ERROR)

(In reply to Andrew Ayer from comment #0)

However, I have identified 36 certificates with this problem:

https://crt.sh/?sha256=05c6ab8929e2227d78ebb91fc51f521308b89d6f7d696105dffcc53f3c7d149c
https://crt.sh/?sha256=0710ad73a6804085dafb7dc8e43b19f24254aee0bbd50a86472c6a2cd739e3e7
[...]

Additionally, that incident report failed to include the complete certificate data.

Thanks for the information. We made a mistake when filtering certificates for the empty SCT extension problem, leading to a miscount and an incorrect date for the earliest certificate affected by the problem. The 6 additional certificates you referenced but we didn’t include in our previous report were all issued on 2021-08-17. We will add the new information to case 1852404.

Flags: needinfo?(nicol.so)

(In reply to Wayne from comment #1)

I can confirm this is still true, and I'm not seeing a response from Commscope regarding this Certificate Problem Report over 4 days in?
https://crt.sh/?sha256=0710ad73a6804085dafb7dc8e43b19f24254aee0bbd50a86472c6a2cd739e3e7

We previously stated in Bug 1852404 comment 7 that

The first of the [affected] certificate was issued on 2022-07-25, the last 2023-08-14.

We now know that the first of the affected certificates was issued on 2021-08-17. The certificate you referenced was issued before 2023-08-14. It was within the set of 30 affected set of certificates whose existence we previously disclosed, which we now know was missing 6 certificates.

Assignee: nobody → nicol.so
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [policy-failure]

(In reply to Nicol So from comment #2)

We will add the new information to case 1852404.

The information provided by Andrew Ayer was added to Bug 1852404 on 2024-07-01.

It has been more than 3 weeks since our last reply to an information request. There has been no follow-up questions. We have no outstanding action items. CommScope would like to request that the issue be treated as resolved and closed.

Flags: needinfo?(bwilson)

Does another incident report need to be filed here, or should this have been marked as a duplicate, or should it have been closed when we closed Bug #1852404?

The main reason I filed a new bug was because I couldn't comment on Bug 1852404, it having been closed 9 months ago.

That said, I think there is value in understanding why not all the affected certificates were found, and to identify any lessons learned about how to ensure future incident reports find all affected certificates. So far, CommScope hasn't provided any information beyond that they "made a mistake". Filing an incident report would provide helpful insights.

Flags: needinfo?(bwilson) → needinfo?(nicol.so)

(In reply to Ben Wilson from comment #6)

Does another incident report need to be filed here, or should this have been marked as a duplicate, or should it have been closed when we closed Bug #1852404?

We are working to provide an incident report on the six missed affected certificates in Bug 1852404, which is the subject of this case. We expect the report to be available within a week. The report will be posted here.

Incident Report

Summary

Six certificates are missing from an incident report filed in Bug 1852404.

Impact

The incident report in Bug 1852404 includes an incomplete set of affected certificates.

Timeline

All times are UTC-7.

2021-08-17:

  • 14:30 CA application reconfigured to require 0 SCTs to be included in issued certificates. This was in response to unexpected errors encountered when publishing precertificates to the configured CT log was attempted. The configuration change was to be reverted after the issue with the publication to CT logs has been resolved. These configuration changes were not recorded in the request management system as required by procedure.
  • 14:31 Six pending certificates with empty SCT extensions were issued.

2022-05-10:

  • CA application configuration changed to require 0 SCTs to be included in issued certificates, for a second time. The configuration change was logged in the request management system.

2023-09-05:

  • 03:39 Timestamp of problem report about empty SCT extensions in our certificates received from external reporter
  • 09:03 The certificates in the problem report were examined. The reported problem was confirmed.

2023-09-08:

  • Draft final report in Bug 1852404 was prepared and circulated for review. The undercount of affected certificates was incorporated into the report.

2023-09-09:

  • 12:34 An incident report was posted in Bug 1852404. The report states that a total of 30 certificates with the problem have been issued.

2024-06-24:

  • 11:20 The present case (Bug 1904402) was created by an external reporter.

Root Cause Analysis

In preparing the final incident report in Bug 1852404, six certificates were not included in the count of affected certificates because an incorrect date range was used to retrieve the certificate records. The date range used was incorrect because it was determined based on formally documented configuration changes that all the affected certificates were issued between 2022-05-10 and the time the problem about empty SCT extensions was reported. However, the temporary reconfigurations of the CA application started on 2021-08-17 were not recorded in the request management system as required by procedure and are the root cause of the error.

A contributing factor to the non-compliance is that the request management system involved was adopted to support the CommScope public CAs operation. The system and the associated procedures were new to the staff at the time. Our operational and administrative personnel are now fully competent with the procedures and requirements of the public CAs, through experience and periodic training, as evidenced by the formally documented change on 2024-05-10.

Lessons Learned

Changes to system configurations must be recorded accurately and consistently in order to support configuration management and investigation of system issues. Application administrators must be adequately trained to record all configuration changes according to the adopted change management process.

What went well

  • (Nothing noteworthy)

What didn't go well

  • A non-compliance in recording a temporary configuration change in the request management system resulted in an inaccurate accounting of the affected certificates in an incident report.

Where we got lucky

  • There was no significant impact to certificate users.

Action Items

Action Item Kind Due Date
None as a result of this incident (Periodic training of personnel on the procedures & practices related to CommScope's public CAs had already been improved and was already an existing practice prior to this incident) -- N/A

Appendix

Details of affected certificates

https://crt.sh/?sha256=3664d3f9ad5aebb6fe9b3bc73b5b65a950a9908b325f0e4fc4c0fa34741ff178
https://crt.sh/?sha256=b0f18184d69790b3da328eec3d47c4611e04a2c8854b2566457b68719c58ab4d
https://crt.sh/?sha256=b73a749b14e45cd618358dec02c21cb4622467acdf6daf8ddd7aac93047dce0b
https://crt.sh/?sha256=7958567506ca778c11a6d553db50db7d9e89d8f3dc96b52322e2f564f9354fbf
https://crt.sh/?sha256=aadaa66b0fde5863bd8ac8043da39e14c1f8f3adf2fed561d2a00e3172492ba4
https://crt.sh/?sha256=38085f60623b3593e9c8dbc27fb8f2cb20b259295863948f17b8a56ea88af00f

Flags: needinfo?(nicol.so)

It has been 13 days since our last comment in response to an information request. We have no outstanding action items and there have been no follow-up questions. CommScope would like to request that the issue be treated as resolved and closed.

Flags: needinfo?(bwilson)

I'll look to close this on or about Friday, 23-Aug-2024.

(In reply to Nicol So from comment #9)

Root Cause Analysis

In preparing the final incident report in Bug 1852404, six certificates were not included in the count of affected certificates because an incorrect date range was used to retrieve the certificate records. The date range used was incorrect because it was determined based on formally documented configuration changes that all the affected certificates were issued between 2022-05-10 and the time the problem about empty SCT extensions was reported. However, the temporary reconfigurations of the CA application started on 2021-08-17 were not recorded in the request management system as required by procedure and are the root cause of the error.

A contributing factor to the non-compliance is that the request management system involved was adopted to support the CommScope public CAs operation. The system and the associated procedures were new to the staff at the time. Our operational and administrative personnel are now fully competent with the procedures and requirements of the public CAs, through experience and periodic training, as evidenced by the formally documented change on 2024-05-10.

A number of things here:

  1. undocumented changes to configuration seem, to me, to be a much bigger deal than a query error! what can be done specifically to ensure that they are not made in the future, beyond relying on human correctness? perhaps the configuration system can be made to require an ID for an entry in the request management system, or the request management system made to actually control the configuration

  2. where was this configuration change recorded, and why was that place not consulted as part of a double check on the analysis that the date range was correct for remediation of the misissued certificates? there seems to be a second possible source of truth, and it would be good to make it policy to check it as well, even if policy also requires that they never differ; this would mean that two operational failures rather than one were required for this to reoccur

  3. training was part of CommScope’s practices before, as they say elsewhere in this bug, but were not sufficient to prevent this from happening before. what specifically has been changed about the training to make it more effective in this regard? what should other CAs be checking to make sure their training covers this area effectively?

  4. how will CommScope determine if training was effective? what would tell CommScope that their training wasn’t effective enough and needed further improvement, short of another incident of misrecorded change? “they got the next one right” doesn’t seem to be sufficient, since presumably CommScope staff correctly executed many changes previous to this incident

Flags: needinfo?(bwilson) → needinfo?(nicol.so)

(In reply to Mike Shaver (:shaver emeritus) from comment #12)

(In reply to Nicol So from comment #9)

Root Cause Analysis

A contributing factor to the non-compliance is that the request management system involved was adopted to support the CommScope public CAs operation. The system and the associated procedures were new to the staff at the time. Our operational and administrative personnel are now fully competent with the procedures and requirements of the public CAs, through experience and periodic training, as evidenced by the formally documented change on 2024-05-10.

A number of things here:

  1. undocumented changes to configuration seem, to me, to be a much bigger deal than a query error! what can be done specifically to ensure that they are not made in the future, beyond relying on human correctness? perhaps the configuration system can be made to require an ID for an entry in the request management system, or the request management system made to actually control the configuration

First of all, thank you for your comments. I want to emphasize that the unrecorded configuration change happened more than 3 years ago. It happened about 2.5 months since our public CAs started issuing subscriber certificates (it was only the second batch of certificates our public CAs issued.) The incident was more like an anomaly because our personnel had not had much experience with the procedure and the then-recently-adopted request management system. Based on the experience we have accumulated since, the process for documenting configuration changes is not particularly onerous or difficult for our personnel to follow consistently.

  1. where was this configuration change recorded, and why was that place not consulted as part of a double check on the analysis that the date range was correct for remediation of the misissued certificates? there seems to be a second possible source of truth, and it would be good to make it policy to check it as well, even if policy also requires that they never differ; this would mean that two operational failures rather than one were required for this to reoccur

The configuration change, and the requisite approval, was required to be recorded in the request management system before deployment. The configuration change resulted in entries in an application log intended for troubleshooting and forensic investigation, but not for change request management/tracking; the configuration change and the timing were determined forensically in our investigation.

  1. training was part of CommScope’s practices before, as they say elsewhere in this bug, but were not sufficient to prevent this from happening before. what specifically has been changed about the training to make it more effective in this regard? what should other CAs be checking to make sure their training covers this area effectively?

The improvement in compliance came from two sources. One is the experience of personnel with the procedures. The other is the periodic training in which the need to document any events not automatically logged has been clarified, emphasized, and impressed on CA personnel over the last 3+ years.

  1. how will CommScope determine if training was effective? what would tell CommScope that their training wasn’t effective enough and needed further improvement, short of another incident of misrecorded change? “they got the next one right” doesn’t seem to be sufficient, since presumably CommScope staff correctly executed many changes previous to this incident

CommScope has been through 3 full cycles of WebTrust audits covering its public CAs. Our external auditor is familiar with our policies and practices that apply to our public CAs. The fact that our auditor did not raise undocumented configuration changes as a concern provides a degree of confidence that our public CA processes have been followed.

Flags: needinfo?(nicol.so)

It has been 20 days since our last comment in response to an information request. We have no outstanding action items and there have been no follow-up questions. CommScope would like to request that the issue be treated as resolved and closed.

Flags: needinfo?(bwilson)

I'll close this on or about Friday, 20-Sept-2024, unless there are additional questions to answer.

Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.