Closed Bug 1881364 Opened 8 months ago Closed 7 months ago

Digicert: SMIME certificate with unvalidated information

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: martin.sullivan, Assigned: martin.sullivan)

Details

(Whiteboard: [ca-compliance] [smime-misissuance] )

Attachments

(1 file)

On 20th Of February DigiCert was made aware via our internal checks of a single SMIME Certificate issued with non validated information within.
This certificate has been revoked.
The investigation to the cause is still ongoing and we expect to have a full report by EOD 28th February.

Whiteboard: [ca-compliance] [smime-misissuance]

Steps to reproduce:

SUMMARY
On the 20th of February, DigiCert’s Internal checks discovered one issued SMIME Certificate that included unvalidated information and was missing the SMIME BR OID.

IMPACT
1 SMIME certificate

TIMELINE -

01:11 16th Feb 2024 Certificate was issued.
16.26 20th Feb 2024 Internal checks identified certificate
16:38 20th Feb 2024 Certificate is revoked

ROOT CAUSE ANALYSIS

During system patching, our MPKI8 disaster recovery activated. This particular disaster recovery system was running older code – code the predated the SMIME BRs. On the 16th Feb 2024, we were investigating a customer issue and discovered that system had a hosts table entry for the disaster recovery endpoint instead of the production system. This disaster recovery system issued the certificate. When reviewing the logs we found the disaster recovery system had issued only one certificate before operations resumed at the production site. No external certificates could issue from the disaster recovery site because of the firewall rules in place.

LESSONS LEARNED

WHAT WENT WELL

DigiCert’s internal checks caught this quickly.

The firewalls protecting this system stopped any external access to these systems.

After system scans no other certificates were found

WHAT DIDN'T GO WELL

Some DR instances were running older code

WHERE WE GOT LUCKY

ACTION ITEMS
Version monitoring for all prod and DR instances – completed 22nd Feb

Uptime checks on DR instances – completed 23rd Feb

APPENDIX
DETAILS OF AFFECTED CERTIFICATES

See attached CRT

Assignee: nobody → martin.sullivan
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

DigiCert is monitoring this for any questions, however as all remediation has been completed if there is no further follow ups, Ben are we ok to close?

Flags: needinfo?(bwilson)

Hi Martin,

I am trying to see some additional value that the community can get from this incident report and perhaps you can provide some more details on the update procedures that include the disaster recovery code and instances. I'm also not sure if this is the complete incident report or a preliminary report (the community is used to seeing more detailed incident reports lately :) .

From the timeline, it is not clear when Digicert fixed the issue (updated the code in the DR site to match the production site).

The S/MIME BRs became effective on September 1, 2023 and you say that the disaster recovery code was not updated for months. Does this affect all DR instances that Digicert uses? Does Digicert have only one DR instance that takes over when the main instance gets updated?

What is the code sync process between the production and the DR instance? How soon does the sync take place after changes in the production code? Are there any tests included when the code changes to ensure the update is effective?

I also don't fully understand the firewalls issue. Firewalls are typically supposed to block external access but this is the CA's code that tried to update the system. Wasn't this update failure caught sooner?

The statement "After system scans no other certificates were found" sounds more appropriate for the "WHERE WE GOT LUCKY" section which is indeed a surprise considering that Digicert issues a lot of certificates :-)

Thank you for the report.

Hi Dimitris,

Thank you for the additoinal questions. We fixed the issue on Feb 20th and added it to our automated DR sync.

The S/MIME BRs became effective on September 1, 2023 and you say that the disaster recovery code was not updated for months. Does this affect all DR instances that Digicert uses? Does Digicert have only one DR instance that takes over when the main instance gets updated?

The only impacted DR system is the one mentioned in the bug. This particular DR system was designed for use with our US federal system. With that no longer applicable, the firewall was configured to block all incoming traffic, which prevented any external traffic from requesting a certificate. However, an internal certificate was requested and slipped by the firewall. The DR systems on our primary issuing systems were not impacted and automatically sync when new code is deployed.

What is the code sync process between the production and the DR instance? How soon does the sync take place after changes in the production code? Are there any tests included when the code changes to ensure the update is effective?

DR code and production code are deployed at the same time. There is a test to ensure the each deployment is successful. Databases are continously synced to keep the data relevant. We mostly use hot sites.

I also don't fully understand the firewalls issue. Firewalls are typically supposed to block external access but this is the CA's code that tried to update the system. Wasn't this update failure caught sooner?

Firewalls can be configured to block any traffic, depending on where the firewall sits in the infrastructure. The DR never activated outside of DR test before this time period. This system is also not our primary issuance system and is earmarked for deprecation later this year.

The statement "After system scans no other certificates were found" sounds more appropriate for the "WHERE WE GOT LUCKY" section which is indeed a surprise considering that Digicert issues a lot of certificates :-)

I think it was more unlucky this one cert got through give the firewall configuration and the fact that only one certificate was requested, even during previous DR exercises. I would not categorize it as lucky at all.

(In reply to Jeremy Rowley from comment #5)

Hi Dimitris,

Thank you for the additoinal questions. We fixed the issue on Feb 20th and added it to our automated DR sync.

Thanks Jeremy,

The S/MIME BRs became effective on September 1, 2023 and you say that the disaster recovery code was not updated for months. Does this affect all DR instances that Digicert uses? Does Digicert have only one DR instance that takes over when the main instance gets updated?

The only impacted DR system is the one mentioned in the bug. This particular DR system was designed for use with our US federal system. With that no longer applicable, the firewall was configured to block all incoming traffic, which prevented any external traffic from requesting a certificate. However, an internal certificate was requested and slipped by the firewall. The DR systems on our primary issuing systems were not impacted and automatically sync when new code is deployed.

This is clearer now, thank you. It seems that relying on firewall rules to decommission a Certificate Issuing system may be a contributing factor to this incident and perhaps the community can learn that this may not be sufficient.

What is the code sync process between the production and the DR instance? How soon does the sync take place after changes in the production code? Are there any tests included when the code changes to ensure the update is effective?

DR code and production code are deployed at the same time. There is a test to ensure the each deployment is successful. Databases are continously synced to keep the data relevant. We mostly use hot sites.

+1

I also don't fully understand the firewalls issue. Firewalls are typically supposed to block external access but this is the CA's code that tried to update the system. Wasn't this update failure caught sooner?

Firewalls can be configured to block any traffic, depending on where the firewall sits in the infrastructure. The DR never activated outside of DR test before this time period. This system is also not our primary issuance system and is earmarked for deprecation later this year.

Yes I am quite aware of how firewalls work, I interpreted "external" the same way you did, as "non-CA originating" traffic. My original understanding from the report was that the existing firewall prevented the DR system from being updated with the new code that complies with the SMBRs, that's why I asked how come you didn't detect the failed attempts to update this DR system :)

The statement "After system scans no other certificates were found" sounds more appropriate for the "WHERE WE GOT LUCKY" section which is indeed a surprise considering that Digicert issues a lot of certificates :-)

I think it was more unlucky this one cert got through give the firewall configuration and the fact that only one certificate was requested, even during previous DR exercises. I would not categorize it as lucky at all.

Depending on the perspective, I guess it could be both. I consider "luck" the fact that only one system managed to switch to that DR instance with old code, issuing only one certificate when it could be multiple systems, all configured the same way as the one using the "hosts" file, issuing multiple certificates. I hope this makes sense.

I have no further questions about this bug and thank you for the prompt and clear response.

DigiCert is monitoring this for any further questions.

Hi Ben are we ok to close this now?
Or are there any further questions?

I'll close this on Friday, 29-Mar-2024, unless there are additional questions.

Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: