Closed Bug 1649951 Opened 4 years ago Closed 4 years ago

DigiCert: Incorrect OCSP Delegated Responder Certificate

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ryan.sleevi, Assigned: martin.sullivan)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

Attachments

(3 files)

The following was originally reported to m.d.s.p. at https://www.mail-archive.com/dev-security-policy@lists.mozilla.org/msg13493.html

DigiCert has issued one or more OCSP Delegated Responders, as defined within RFC 6960, Section 2.6 and Section 4.2.2.2, without including the id-pkix-ocsp-nocheck response, as required by the Baseline Requirements, Version 1, Section 13.2.5 through Version 1.7.0, Section 4.9.9

Example certificate: https://crt.sh/?id=21606058

Please provide an incident report, including the timeline for revocation.

We acknowledge this problem report and are investigating it prior to posting a response.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

DigiCert originally became aware of this from a post to MDSP https://groups.google.com/forum/#!topic/mozilla.dev.security.policy/EzjIkNGfVEE which was followed up with this bug being created.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

1-July-2020 21:06 original post on MDSP https://groups.google.com/forum/#!topic/mozilla.dev.security.policy/EzjIkNGfVEE
1-July-2020 21:10 Message was picked up and discussion was started.
2-July 2020 01:44 This bug was created (reference: https://bugzilla.mozilla.org/show_bug.cgi?id=1649951 )
2-July 2020 16:37 Tim posted DigiCert’s position to the Mozilla mailing list.
2-July 2020 22:53 Scope list of ICA’s is generated. Customer notifications sent.
2-July 2020 through 7-July 2020 – Phone calls with users trying to determine how to optimize shut down.

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

The affected certificates are all legacy ICA’s from 2016. Moving forward, DigiCert will not issue any new certificates with the OCSP EKU. The ICAs themselves are still issuing certificates. We’re working on a shut-down plan and are ready to do any additional emergency key ceremonies to resolve this as expeditiously as possible.

The ICAs impacted are:
Chaining back to the DigiCert Global Root G3
ABB ECC Intermediate CA 1 - https://crt.sh/?id=155179794

Chaining Back to the Baltimore CyberTrust Root
ABB Intermediate CA 4 - https://crt.sh/?id=155179795
IEXTCA-SSL.ibechtel.com - https://crt.sh/?id=9400462
VZ Cybertrust Client CA - https://crt.sh/?id=135970330
Microsoft IT TLS CA 1 - https://crt.sh/?id=21606064
Microsoft IT TLS CA 2 - https://crt.sh/?id=21606056
Microsoft IT TLS CA 4 - https://crt.sh/?id=21606070
Microsoft IT TLS CA 5 - https://crt.sh/?id=21606058

Chaining Back to the Symantec Class 2 Public Primary Certification Authority - G6
KPN Class 2 CA - https://crt.sh/?id=341594698

This represents five customers, each of which are requesting for different timelines to mitigate the operational risk. Revocation in each case means revoking the intermediate and destroying the key material.

Customer 1:
ABB ECC Intermediate CA 1 - https://crt.sh/?id=155179794
ABB Intermediate CA 4 - https://crt.sh/?id=155179795
We have revoked these Code Signing constrained ICAs this week.

Customer 2:
IEXTCA-SSL.ibechtel.com - https://crt.sh/?id=9400462
The parent of this ICA is already revoked.

Customer 3:
KPN Class 2 CA - https://crt.sh/?id=341594698
We plan to revoke this ICA by mid October . There are a few sub CAs under this ICA, and we are still working through the overall impact of the revocation.

Customer 4:
VZ Cybertrust Client CA - https://crt.sh/?id=135970330
We are planning on revoking this at the end of July. They are in active replacement mode of all certificates chaining to the intermediate.

Customer 5:
Microsoft IT TLS CA 1 - https://crt.sh/?id=21606064
Microsoft IT TLS CA 2 - https://crt.sh/?id=21606056
Microsoft IT TLS CA 4 - https://crt.sh/?id=21606070
Microsoft IT TLS CA 5 - https://crt.sh/?id=21606058

They’ve asked that we revoke in eight months. They are planning to rebuild their security environment and move to a new Certificate chain. Microsoft have asked for this time period due to the below. Considering the size and number of certificates, along with the remediation being taken, we believe the timeline is appropriate.

  1. 6+ million certificates are issued from the current set of CAs
  2. External and internal services would need to move to a new certificate chain. This would need careful updating of the code to prevent any outages or downtime to their end customers. If customers are doing certificate pinning, then it becomes a more involved process and it further needs careful update of all impacted clients to ensure it is fixed before certificate roll over happens. This is to ensure the industry and services are not severely impacted by this roll over.

Timeline are based on whether certificates are pinned:
• Non-Pinned Certs
• Move all non-pinned certs at an expedite rate to another compliant provider. This has already started and plan to complete by November.
• Pinned Certs
• For all the pinned certs we will be ready for issuance by end of July. Internal and customers will start moving to these new ICAs and end date is expected by end of February

Bechtel:
https://crt.sh/?id=881434274
https://crt.sh/?id=6976985
Since the parent was revoked years ago, and no longer have a commercial agreement in place, we are contacting Bechtel to inform them of this issue.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

See above.

Each of these ICAs were issued between 11-Feb-2016 and 25-Oct-2016. Each of these have active certificates issued under the chain. The most unique case is ibechtel.com, where the parent is revoked but the ICA is live. We acknowledge there is a security concern with forged OCSP responses despite the parent being revoked and are taken action to revoke that ICA and destroy the keys.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

See #4

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We recognize the security risk and are replacing the certificates. DigiCert doesn’t primarily use delegated responders, meaning there is no reason to create new ICAs with the OCSPSigning EKU going forward.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

We are revoking all ICAs with OCSPSigning EKU and having the customers destroy the key material when we are able to get this arranged.

Given the present situation with lockdowns and travel restrictions, we will provide an update on how we can carry out a witnessed key destruction activity or file a delay of revocation bug considering the pandemic situation.

Timelines are provided in #3.

Hi Martin,

There seem a few substantially longer timelines than what Tim suggested, and that’s quite surprising and a little worrying.

I’m lacking a lot of context here, because the incident report simply isn’t including the needed information. It seems like for some customers, the answer is “We don’t plan to revoke because it’s hard (and/or because they’re a root store we need our certificates to work with)”. I can understand wanting to be careful there, but you seem to have no description about what controls are in place, no disclosure of your risk analysis or how you’re mitigating it, and no description of how you’re addressing community concerns in the interim. “Things are hard” is, in the abstract, an understandable challenge. The purpose of these reports is to unpack specific challenges, and then demonstrate how the CA has a plan going forward to mitigate them.

More importantly, for all of these delays, there’s seemingly no description of any plan how this will be better. At best, it might be “they’ll be somebody else’s problem”, but that doesn’t inspire hope that the root cause will be mitigated.

For all of this, there’s no milestones on how to quantify progress, not is there any clear reasoning behind the dates. It seems like they were selected largely as a process of rolling a fair die.

In terms of making sense of timelines, and making sure there aren’t details missing, can you provide a forward facing timeline of milestones, chronologically ordered?

Flags: needinfo?(martin.sullivan)

Are you asking more details about Microsoft since it's the long pole? There are details on that one, but we can get more from them. For the rest:

Unknown:
IEXTCA-SSL.ibechtel.com - https://crt.sh/?id=9400462
Because the parent of this is already revoked and we don't have a relationship with this entity, we are working through how to even do this one. We've reached out to the key holder to talk about key desctruction (and whether they still have this one). We don't have a good ETA on this because we haven't heard back from the key holder on when key destruction can occur.

Week of July 5th:
Revoke the following:
ABB ECC Intermediate CA 1 - https://crt.sh/?id=155179794
ABB Intermediate CA 4 - https://crt.sh/?id=155179795
(and schedule key destruction - working on that)

July 24th:
VZ Cybertrust Client CA - https://crt.sh/?id=135970330
They are actively replacing certificates now. There are about 1300 that are being used by various federal entities. They asked for end of July to ensure there isn't an interruption to services supporting federal agencies that don't move quickly, especially during a pandemic.

Mid October:
KPN Class 2 CA - https://crt.sh/?id=341594698
We are trying to pull this in to an earlier date and are creating a replacement CA now. October is the latest for key destruction. We're going to try and revoke the ICA in Sept. Similar to PKI Overheid, they control several issuing Sub CAs under them that issue smime. We are talking to the sub CAs under the KPN root and working through how to revoke them and get key destruction.

Feb 2021:
See above for the reasons. If that wasn't sufficient explanation, we can have them provide more information.

Of the five, the first four are the ones we'd like to accelerage the most (even with the short timeframes) since they poise the biggest security risks. Microsoft, on the other hand, is already a trusted CA within Mozilla, Microsoft, and Apple, meaning they share responsibility for correct operations. Although this does not mitigate the unaccepted compounded risk that if one CA is compromised then the whole CA is compromised, we weren't sure what additional time demands this placed on ICA reaplcement.

If any of these timelines are long, then we would appreciate guidance from additional browsers on what a reasonable timeline for replacement would be. Likewise, happy to provide more information about each of these ICAs that lack sufficient details compared to the timeline specified for revocation and key destruction.

(In reply to Jeremy Rowley from comment #4)

Are you asking more details about Microsoft since it's the long pole? There are details on that one, but we can get more from them. For the rest:

Unknown:
IEXTCA-SSL.ibechtel.com - https://crt.sh/?id=9400462
Because the parent of this is already revoked and we don't have a relationship with this entity, we are working through how to even do this one. We've reached out to the key holder to talk about key desctruction (and whether they still have this one). We don't have a good ETA on this because we haven't heard back from the key holder on when key destruction can occur.

Right, in this case, the impact of that sub-sub-CA is only to affect that sub-CA, so that seems mitigated.

Mid October:
KPN Class 2 CA - https://crt.sh/?id=341594698
We are trying to pull this in to an earlier date and are creating a replacement CA now. October is the latest for key destruction. We're going to try and revoke the ICA in Sept. Similar to PKI Overheid, they control several issuing Sub CAs under them that issue smime. We are talking to the sub CAs under the KPN root and working through how to revoke them and get key destruction.

Thanks. That's Bug 1649964 , for future reference.

Feb 2021:
See above for the reasons. If that wasn't sufficient explanation, we can have them provide more information.

Of the five, the first four are the ones we'd like to accelerage the most (even with the short timeframes) since they poise the biggest security risks. Microsoft, on the other hand, is already a trusted CA within Mozilla, Microsoft, and Apple, meaning they share responsibility for correct operations. Although this does not mitigate the unaccepted compounded risk that if one CA is compromised then the whole CA is compromised, we weren't sure what additional time demands this placed on ICA reaplcement.

Isn't it correct that these are two separate teams? The fact that it's "not the same" has been used in the past to excuse issues (e.g. Bug 1604124 / Bug 1424305), so I'm trying to square that here.

Since you mentioned PKIoverheid's response, compare that set of proposed controls (still ongoing discussion) with what DigiCert has offered here, and there's a notable lack of detail.

I would say with the detail you've provided, the timeline is unacceptably long. That's why I asked, and continue to ask, for more detail, to better understand if the timeline proposed is 'safe' and 'reasonable'. The goal of these reports is to understand how, in this situation, the CA is planning to show nothing has gone wrong, nothing can go wrong, and nothing will go wrong, for whatever time period they're proposing. Similarly, to understand what steps are being taken to mitigate and prevent future delays. We've seen in the past, for example with low-entropy serial numbers, that a number of CAs were able to quickly replace millions of certificates, in an order less than O(months). Looking to understand why that doesn't work here is important to understanding how this will be fixed going forward.

To try to put differently: If the argument is that the volume of certs is large, then establishing details comparing this to other similarly large-or-larger volumes of certs in the past is useful to understand. Did the CA fail to learn from those past experiences? Were they in the process of implementing solutions, like automation? If so, what was the (old) timeline, why wasn't it completed by now, and what's being done going forward? If this is seen as different than past incidents, sharing details about why it's different is relevant. If it's the same as past incidents, sharing details about why they weren't learned from is relevant. Sharing details about how, in the future, CAs can avoid this being an incident, and specifically how this CA is avoiding a repeat, is relevant.

On the details of this incident report alone, one might conclude that, in the future, if there were any future issues with intermediates, it would also take 7 months to remediate, rather than 7 days, because no detail is given here about the delay and its mitigation. That would be unacceptable then, and that's why it should be unacceptable now. Detail is what helps inform the balanced tradeoff of saying "We accept the risk now, because in the future, we believe there's a clear and viable strategy to reduce the risk, and we can see this as an opportunity to learn".

Just a quick Update,

We are working with Microsoft on a plan to reduce their timeline we are hoping to have something we can post publicly by the end of this week.

IEXTCA-SSL.ibechtel.com - https://crt.sh/?id=9400462
Are preparing a formal letter describing their destruction process that was completed some time ago, they are expecting this to be provided by the 24th July.

VZ Cybertrust Client CA - https://crt.sh/?id=135970330
KPN Class 2 CA - https://crt.sh/?id=341594698
Both are on track with their timelines.

Note: This response encompasses impacted Issuing CAs managed by DSRE PKI at Microsoft, which includes the following Issuing CAs:

  • Microsoft IT TLS CA 1
  • Microsoft IT TLS CA 2
  • Microsoft IT TLS CA 4
  • Microsoft IT TLS CA 5

We have taken this very seriously and have grouped this into 2 major issues. There is the original issue related to the OCSPSigning EKU. And this highlighted a different issue related to our ability to meet the CA/B Forum revocation requirements without causing major outages for our users. We are working on both, but this update will only focus on planning for the eventual key destruction of our 4 impacted ICAs.

While this update does not have details about how we are going to eventually meet the CA/B Forum revocation requirements, we do acknowledge that it is our goal to get there and we will have a plan as part of this remediation.

First, here is an update of what we have done so far:

  • We’re moving dependencies away from the DigiCert Baltimore Root CA as soon as possible
    • High profile certs have already started to be re-issued through Azure to compliant DigiCert ICAs
    • We’re starting issuance through compliant ICAs (hosted by a different PKI team within Microsoft) by end of July for most other certs. This was planned for 2021 as part of a larger initiative to increase agility during security or compliance issues but is being deployed earlier as part of this remediation.
    • For any certificates that still have a dependency on Baltimore, the DSRE PKI team within Microsoft have already deployed 2 new compliant ICAs. These will also be able to start supporting re-issuance by the end of July.
  • We created a project to track the re-issuance activity for any certificates issued from the above 4 ICAs
    • To avoid CRL bloat, we are not planning to perform mass-revocation of leaf certificates. Instead, we are ramping up tracking of each certificate to ensure that the subscriber has re-issued it from a compliant ICA and has acknowledged it is no longer needed.
    • We have hired additional staff to track this project burn-down and manage internal communications with subscribers and internal leadership.
    • We will issue monthly status updates with percentage of subscriber certificates remaining and updated timeline estimates as the project progresses

While the OCSPSigning EKU issue will not be fully remediated until we destroy the CA keys, several mitigations have been implemented to reduce the risk:

  • By the end of July, DigiCert will revoke all other ICAs chaining to the DigiCert Baltimore Root, leaving only the 4 Microsoft IT TLS CAs
  • By the end of July, we will have all infrastructure in place to support re-issuance from complaint ICAs
  • We have prioritized subscribers with a higher business impact and have already started re-issuance of their certificates from compliant ICAs
  • Windows Update will remove the OCSPSigning EKU from the Baltimore Root on all Windows clients in its August release

Our remediation plan considers the urgency to eventually destroy the ICA keys, while balancing the subscriber impact. Our rough timeline includes:

  • 2020-08-01: All infrastructure ready to issue certificates for all subscriber dependencies
  • 2020-09-01: 10% of certificates re-issued and acknowledged by the subscribers as no longer needed
  • 2020-10-01: 50% of certificates re-issued and acknowledged by the subscribers as no longer needed
  • 2020-11-01: 80% of certificates re-issued and acknowledged by the subscribers as no longer needed
  • 2021-01-15: 95% of certificates re-issued and acknowledged by the subscribers as no longer needed. Default deny certificate request from legacy CAs
  • 2021-02-15: 100% of certificates re-issued and acknowledged by the subscribers as no longer needed. The 4 impacted Microsoft IT TLS CAs will be ready for key destruction.
  • 2021-03-01: The 4 impacted Microsoft IT TLS CAs will complete key destruction

As mentioned, this is still a rough timeline. While we are confident in meeting the end date, the estimated percent complete will be adjusted as the project progresses.

Thanks Dustin. As far as detailed plans go, I think this is a really good example.

That said, I’m hoping you can also share plans about managing the interim state from the perspective of audits and key controls. It wasn’t clear if that was the plan you were mentioning was forthcoming and still being worked on. If it was, I think a similarly detailed plan, as in Comment #7, which incorporates best practices regarding audits and transparency would be highly valued. If I misunderstood, then I think that’s still something that’s reasonable to ask: how Relying Parties can be confident nothing has, can, or will go wrong. This is definitely the most CA-specific part, and transparency here is the most useful element to thinking long-term how we can better manage such incidents.

To make sure I understand: The plan regarding how the new DSRE ICAs will be agile so as to avoid similar delays in the future is also forthcoming, correct? Just wanting to make its appropriately clear what’s still TBD. If I understand correctly, that’s something for which you don’t have a concrete timeline for, but expect to develop it as part of this migration and the lessons learned?

Ryan,

I broke this out into separate questions based on my interpretation of what I understand you are asking:

1. How can we get assurance the problem is addressed once all is complete?
We will perform key destruction ceremonies for each impacted ICA’s key pair, which will be witnessed and reported on by our external auditor. In addition, we will be receiving a report on the generation of the new ICAs.

2. How can we get assurance controls are in place to reduce the risk of unauthorized use or access from our impacted CAs in the interim?
By the end of July, DigiCert is planning to complete the key destruction of all Issuing CAs under the Baltimore root, except for the 4 Microsoft DSRE PKI ICAs (Comment #2). This dramatically reduces the risk by allowing relying parties to selectively distrust these issuing CAs and allows for focus on these 4 ICAs that are operated identically.

These are some of the controls currently in place to reduce the risk of unauthorized use or access:

  • We use template-based issuance with one template for the RA application and a second for the OCSP delegated responder, preventing the ICA software from signing unapproved OCSP responses.
  • ICA keys are stored and all cryptographic operations are performed within the HSMs. Access to the keys on the HSMs is configured with a physical token and a passphrase protection.
  • The ICAs and HSMs are in an isolated network that is highly restricted. They have no direct connection to the intranet or internet.
  • Multi-person controls are implemented for logical and physical access to the systems and access is limited to only members of the DSRE PKI team.
  • We only allow approved code through our SDLC release process and any other software packages require a thorough vetting through our security and risk assessment process from a group that has no access to the PKI environment.

This key use and protection criteria will be included in the key destruction report.

We are also researching additional monitoring to immediately report if there is an attempt to sign an OCSP message by any of the ICAs.

3. How can we ensure this will be able to be resolved quicker in the future?
The plan that we still need to provide will be related to the ultimate goal of how all DSRE PKI publicly-trusted CAs can meet the BR key destruction time periods. We will provide a timeline in a future update that will include the items that we are planning to change.

Thank you,

Dustin

This week’s update on the CA's

IEXTCA-SSL.ibechtel.com - https://crt.sh/?id=9400462
They have submitted a signed attestation confirming HSM's are zeroized and keys destroyed.
For Legal reasons they keep a back-up of the CA material in case of an Investigation.
If these were ever to be recovered in-depth assistance from the HSM maker would be required and DigiCert would be notified of this.

VZ Cybertrust Client CA - https://crt.sh/?id=135970330
This was revoked on schedule on July 24, 2020. We are now scheduling Key destruction; once auditor availability is confirmed we will give the date for this.

ABB - https://crt.sh/?id=155179795
Confirming Availability of External Auditor for witnessing destruction but to re-confirm this CA has been revoked.

KPN Class 2 CA - https://crt.sh/?id=341594698
On track with the mentioned timelines. The CAs will be revoked by no later than October 20, 2020 with a plan for key destruction by end of October / early November (subject to Auditor Witness availability)

Microsoft have elaborated on their plan with timelines above:
They are tracking to date on that.

Flags: needinfo?(martin.sullivan)

For Legal reasons they keep a back-up of the CA material in case of an Investigation.

I'm not sure I understand this. It sounds like there was not actually a CA key destruction? Was this attestation witnessed by auditors or is this a self-attestation?

Flags: needinfo?(martin.sullivan)

Quick update, we waiting on a reply from Bechtel to see if there is evidence of the Key Destruction

Verizon are currently Scheduled for their Witnessed Destruction by the end of next week.

ABB is revoked and are working on getting a witness for Key Destruction.

KPN and Microsoft are both working towards their current timelines.

Flags: needinfo?(martin.sullivan)

This weeks update:

  • Bechtel's key destruction in 2017 when we terminated our relationship was witnessed by another Bechtel employee, not by an external auditor.

  • Verizon completed the witnessed Key destruction this week; the auditors report will be available shortly.

  • ABB is revoked. Due to travel restrictions by their Company and with the Crypto material being in a separate country from their PKI operations team, the key destruction is looking to be completed around Q1 2021.

  • KPN and Microsoft are both working towards their current timelines.

No updates for this week. Everyone is still on the same timeline.

Please find attached the Key destruction report for the Verizon ICA.

a progress update can be found:
https://bugzilla.mozilla.org/show_bug.cgi?id=1651461#c11

we are planning the next update to be Nov 1st.

an update including the revocation of the KPN CA's can be found

https://bugzilla.mozilla.org/show_bug.cgi?id=1651461

Please find an update on the staus of these CA's in the matched bug.

https://bugzilla.mozilla.org/show_bug.cgi?id=1651461

you can find current progress here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1651461

This has been updated in the below bug
https://bugzilla.mozilla.org/show_bug.cgi?id=1651461

Please find the Key destruction report for the KPN CA attached.

we are still on track for the revocation of the Microsoft CA's next week

I can confirm the revoke happened on schedule on the 16th

the Key destruction audit/report is currently in the process.

Once we have recieved/posted this report is there anything futher needed to close this off?

(In reply to Martin Sullivan from comment #29)

Once we have recieved/posted this report is there anything futher needed to close this off?
I believe that the issue can be closed once we have the report.

See above for Microsoft's witnessed key destruction report.

this was the last action we had for this case.

If there are no other questions I will ask for this to be closed off.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ocsp-failure]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: