Open Bug 1566162 Opened Last month Updated 9 hours ago

DigiCert: Failure to supervise ABB Subordinate CA

Categories

(NSS :: CA Certificate Compliance, task)

task
Not set

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: wayne, Assigned: jeremy.rowley)

Details

(Whiteboard: [ca-compliance])

Attachments

(1 file)

In bug 1456655, we have been waiting for ABB to revoke roughly 500 certificates since intermediates that were misissued by DigiCert were discovered in April 2018. Since then, only about half of the certificates have been revoked. Despite "working on accelerated replacement", ABB is making little progress and DigiCert has not apparently set any deadlines for ABB. DigiCert appears to simply be communicating status when asked, as highlighted in comment #22, in which DigiCert exposes a failure to even follow up on the issue. We place a tremendous amount of trust in and responsibility on CAs that sign CA certificates for other organizations, and when CAs fail to provide proper oversight, bad things like the Symantec distrust result.

Please provide an incident report describing what happened, how it will be remediated, and how oversight of subordinate CAs will be improved.

If the plan is, as mentioned in bug 1456655, to end the use of externally operated subordinate CAs, when will that happen (i.e. when will the last one be revoked)?

For clarity, the plan posted was not all externally operated CAs - the plan was to get rid of all externally operated non-browser TLS Sub CAs.

Can we also clarify what you mean by not working on accelerated replacement? In the bug, we said "CA 8 and CA 9 are signed by CA 5 and are therefore dependent on it remaining valid. CA 8 and CA 9 last leaf certs expire: October 19, 2020; we are working on accelerated replacement." (comment 7) We were currently shooting for end of the year. If the original plan was October 2020 bug, then end of the year cuts off 10 months from the original timeline. We track the progress of shut down pretty closely, but generally we track no-more-issuance more closely that we do get off the Sub CA. Note that ABB is not issuing new certs off this ICA.

I would say we haven't been improperly overseeing as much as not properly communicating status changes about status. What we need is a better reporting system to ensure we post to bugzilla regularly, even if that post is "nothing new to report". This will make the monitoring on long-term projects like this bug and the dev projects more transparent. We will draw up a plan and present it here on how we are going to do that better.

We didn't think Mozilla wasn't getting sufficient information given that the deadline for revocation was still over a year away. Apologies if the updates weren't often enough. How often would you like information? We're thinking that the updates would happen every two weeks going forward. Work for you?

Also, can I object to the "failure to supervise"? We supervise that the CAs aren't issuing new certs and ask for updates regularly. We are aware of what is going on. For example, we know they had a change in staff at ABB which impacted the number of certificates revoked this last month, resulting in a relatively low level of replacements. We weren't too concerned because they are still tracking to end of year.

I'll post the status of all other TLS CAs in a minute, once I get the data bit more organized. All but one of the non-Quovadis CAs are no longer issuing, I think. Most are already shut down. We don't have timelines for the Quovadis Sub CAs yet, and are working to establish timelines.

Another interesting question is what is a reasonable timeline for shut down? We were thinking of letting most of these expire naturally as long as there is no more issuance. Belgium has been inactive for a while. We don't plan on turning off the ICA any time soon. CTJ will be off soon. At that point, we plan on allowing them to sign revocation responses but no new certificates.

Jeremy,

Thanks for the response.

(In reply to Jeremy Rowley from comment #1)

For clarity, the plan posted was not all externally operated CAs - the plan was to get rid of all externally operated non-browser TLS Sub CAs.

Can you explain what an "externally operated non-browser TLS Sub CA" is? Are these certs being issued from a publicly-trusted hierarchy that don't require public trust?

Can we also clarify what you mean by not working on accelerated replacement? In the bug, we said "CA 8 and CA 9 are signed by CA 5 and are therefore dependent on it remaining valid. CA 8 and CA 9 last leaf certs expire: October 19, 2020; we are working on accelerated replacement." (comment 7) We were currently shooting for end of the year. If the original plan was October 2020 bug, then end of the year cuts off 10 months from the original timeline. We track the progress of shut down pretty closely, but generally we track no-more-issuance more closely that we do get off the Sub CA. Note that ABB is not issuing new certs off this ICA.

So the comment on "accelerated replacement" is relative to the natural expiration of the certificates rather than the BR revocation deadline? I interpreted the statement to mean that the certificates would be revoked quickly, but not within the 7 days permitted by the BRs for misissued subordinate CA certificates.

I would say we haven't been improperly overseeing as much as not properly communicating status changes about status. What we need is a better reporting system to ensure we post to bugzilla regularly, even if that post is "nothing new to report". This will make the monitoring on long-term projects like this bug and the dev projects more transparent. We will draw up a plan and present it here on how we are going to do that better.

Sounds good. Lack of communication is certainly a factor.

We didn't think Mozilla wasn't getting sufficient information given that the deadline for revocation was still over a year away. Apologies if the updates weren't often enough. How often would you like information? We're thinking that the updates would happen every two weeks going forward. Work for you?

I don't see anywhere in the bug that states that the target for revocation was the end of 2019?

Mozilla's incident reporting guidance requests weekly updates unless a Mozilla representative suggests or accepts a different schedule, which typically only happens when there is a plan with dates in place.

Also, can I object to the "failure to supervise"? We supervise that the CAs aren't issuing new certs and ask for updates regularly. We are aware of what is going on. For example, we know they had a change in staff at ABB which impacted the number of certificates revoked this last month, resulting in a relatively low level of replacements. We weren't too concerned because they are still tracking to end of year.

It appears to me that DigiCert did not set deadlines and push ABB to remediate, and per comment #22 didn't proactively seek updates.

I'll post the status of all other TLS CAs in a minute, once I get the data bit more organized. All but one of the non-Quovadis CAs are no longer issuing, I think. Most are already shut down. We don't have timelines for the Quovadis Sub CAs yet, and are working to establish timelines.

Per my first question, does this wind down include all externally operated TLS capable sub CAs trusted by Mozilla?

Another interesting question is what is a reasonable timeline for shut down? We were thinking of letting most of these expire naturally as long as there is no more issuance. Belgium has been inactive for a while. We don't plan on turning off the ICA any time soon. CTJ will be off soon. At that point, we plan on allowing them to sign revocation responses but no new certificates.

The point of my question was to help me understand how long this kind of oversight issue will remain a concern.

Can you explain what an "externally operated non-browser TLS Sub CA" is? Are these certs being issued from a publicly-trusted hierarchy that don't require public trust?

This means everything hosted externally issuing TLS that is not Apple or Microsoft. We don't plan on shutting down non-TLS or issuing CAs provided to Apple or Microsoft. All others are on being deprecated.

So the comment on "accelerated replacement" is relative to the natural expiration of the certificates rather than the BR revocation deadline? I interpreted the statement to mean that the certificates would be revoked quickly, but not within the 7 days permitted by the BRs for misissued subordinate CA certificates.

Yes. The issuing CA had been around well past 7 days when the bug was posted, we meant that the we were going to get them off the on-prem solution before Oct 2020. We've proposed Aug 2019 as the turn-off date and are waiting to hear back from them. If you'd like an earlier shut-down period, we can ensure that happens.

I don't see anywhere in the bug that states that the target for revocation was the end of 2019?

Yes - That was the lack of communication again. The plan was for end of 2019. And we should have been showing that with the periodic updates. With both the development bugs (like the CAA bug) and these longer term Sub CA shut down bugs, we plan on posting the incremental steps to get to the shut down goal. That will be part of the plan we post. Brenda is working on a plan now.

Mozilla's incident reporting guidance requests weekly updates unless a Mozilla representative suggests or accepts a different schedule, which typically only happens when there is a plan with dates in place.

Sounds good. We will go with one week expect where two weeks was expressly stated (on the dev bugs)

It appears to me that DigiCert did not set deadlines and push ABB to remediate, and per comment #22 didn't proactively seek updates.
Fair enough.

Per my first question, does this wind down include all externally operated TLS capable sub CAs trusted by Mozilla?
Minus Apple and Microsoft.

The point of my question was to help me understand how long this kind of oversight issue will remain a concern.
There are gradients of concern I think. The first concern we've had is to stop them from issuing. That has been our primary concern. The second concern is to wrap up the on-prem CA. We've got all of the legacy sub CAs except CTJ at the stop-issuing point. With the Quovadis acquisition, there are a couple of new ones to shut down. With all of the external CAs used only to sign revocation, the risk itself is much lower (no new problems) so we can focus on getting the actual shut down completed. This has proven more difficult, and we'be generally waited for the certs to expire.

Here's the current status:

  1. ABB - 248 certs remaining until we shut down the TLS Sub CA. Currently planned to shut down end of 2019. New proposal is Aug 2019, although we are waiting to hear back from them.
  2. Belgium Government - Added to OneCRL
  3. CTJ - Still issuing. The contract expires later this year, at which point they will cease issuing and be on the "wait until all certs expire" plan. Actively migrating to a hosted solution, but migration was interrupted by Symantec acquisition.
  4. Siemens - No longer issuing. Waiting for all certs to expire
  5. T-Systems - No longer issuing. Waiting for all certs to expire
  6. Verizon - No longer issuing. Revocation of 4 additional CAs pending. Working on migrating rest of CAs.

Quovadis:

  1. Bayern - Ceased issuance this month
  2. Darkmatter - Negotiations in progress
  3. Fiducia - Cease issuance in effect. Waiting for final webtrust report to begin shut down
  4. Siemens - Beginning negotiations

(In reply to Jeremy Rowley from comment #3)

So the comment on "accelerated replacement" is relative to the natural expiration of the certificates rather than the BR revocation deadline? I interpreted the statement to mean that the certificates would be revoked quickly, but not within the 7 days permitted by the BRs for misissued subordinate CA certificates.

Yes. The issuing CA had been around well past 7 days when the bug was posted, we meant that the we were going to get them off the on-prem solution before Oct 2020. We've proposed Aug 2019 as the turn-off date and are waiting to hear back from them. If you'd like an earlier shut-down period, we can ensure that happens.

I find the need to object to the framing of "if you'd like an earlier shut-down period". Your CA is responsible for ensuring compliance with the Baseline Requirements. We would like all CAs to adhere to the Baseline Requirements and the Root Program rules. Some CAs may make deliberate business decisions to disregard those requirements, and when it becomes a pattern, it often becomes necessary for root stores to take action, if there is a demonstration that the CA is unwilling or unable to do what is expected of them and what they have committed to.

With respect to the timeline, it's perhaps clearer the issues:

  • 2018-03-16 - Issue reported
  • 2018-04-24 - DigiCert acknowledges the issue with an incident report.
  • 2018-06-25 - Wayne highlights that DigiCert has not acknowledged any intention to revoke, seeking to clarify that is the intent.
  • 2018-06-26 - DigiCert commits to "accelerated replacement"
  • 2018-06-27 - Wayne requests periodic updates, with a deadline of 2018-10-01
  • 2018-10-12 - DigiCert acknowledges no progress has been made.
  • 2018-12-26 - Ryan again requests periodic updates, noting concern about the issue of "accelerated replacement" and lack of periodic updates.
  • 2019-01-08 - DigiCert acknowledges limited progress, but no details. Importantly, DigiCert commits to 2019-03 to have all e-mail certificates and a majority of server certificates replaced. This understanding is confirmed the same day
  • 2019-04-19 - DigiCert reports that a minority of certificates (192 of 492, or roughly 39%) have been revoked, leaving 300 valid. DigiCert commits to monthly updates
  • 2019-06-17 - DigiCert is reminded they have not provided updates.
  • 2019-06-18 - DigiCert reports that an additional 52 certificates were revoked. At this point, only 49.5% of certificates have been revoked, three months after the DigiCert had committed to have a majority replaced, and 15 months after the incident.

I highlight this, because this is not just an issue about a lack of updates, but that DigiCert has either mislead the community or repeatedly missed its committed milestones.

Yes - That was the lack of communication again. The plan was for end of 2019. And we should have been showing that with the periodic updates. With both the development bugs (like the CAA bug) and these longer term Sub CA shut down bugs, we plan on posting the incremental steps to get to the shut down goal. That will be part of the plan we post. Brenda is working on a plan now.

I'm unable to square this description with https://bugzilla.mozilla.org/show_bug.cgi?id=1456655#c11 . That is, I don't believe it's a lack of communication, I believe it may be miscommunication, whether deliberate or not. The things you are telling us are not fitting with the things you are reporting you did not tell us.

Mozilla's incident reporting guidance requests weekly updates unless a Mozilla representative suggests or accepts a different schedule, which typically only happens when there is a plan with dates in place.

Sounds good. We will go with one week expect where two weeks was expressly stated (on the dev bugs)

Are you referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1456655#c24 ? Or is there some other comment you are referring to?

I find the need to object to the framing of "if you'd like an earlier shut-down period". Your CA is responsible for ensuring compliance with the Baseline Requirements. We would like all CAs to adhere to the Baseline Requirements and the Root Program rules. Some CAs may make deliberate business decisions to disregard those requirements, and when it becomes a pattern, it often becomes necessary for root stores to take action, if there is a demonstration that the CA is unwilling or unable to do what is expected of them and what they have committed to.

Sorry, I meant do you think the end of the year is reasonable for the shut down or should it be earlier? We are having a hard time shutting this one down. I've proposed a final cut-off of Aug 30 as a position to ABB, but if we need to negotiate a date with ABB that's later. We'd like to shut this down as possible.

With respect to the timeline, it's perhaps clearer the issues:

I'm unable to square this description with https://bugzilla.mozilla.org/show_bug.cgi?id=1456655#c11 . That is, I don't believe it's a lack of communication, I believe it may be miscommunication, whether deliberate or not. The things you are telling us are not fitting with the things you are reporting you did not tell us.

Which part? It's not deliberate mis-communication, but there are some process issues that need to be stamped down on when it comes to ensuring the bugs and reporting are followed up on.

Are you referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1456655#c24 ? Or is there some other comment you are referring to?

I was referring to the CAA bug (https://bugzilla.mozilla.org/show_bug.cgi?id=1550645) and the scope bug (https://bugzilla.mozilla.org/show_bug.cgi?id=1556948) which are using dev as the long-term system remediation for the "never again" part of the bug.

Thinking about this more, I did mis-communicate and fail to do what was promised. I'm sorry for that. Specifically, I promised to provide periodic updates on the shut down of these issuing CAs and haven't do so regularly. We got them all (except one) to the point of no longer issuing and then declared victory. That's really only the half-way point, and I made fun of a certain president for doing that very thing. I should have been following these all the way through until they really are turned off - revoked, expired, whatever we can do. I'll recommit do doing that and make sure we keep on top of these until it crosses the finish line.

We are coming up with a process to ensure there are weekly reminders to post the status of this on the forum so that no single person at DigiCert can drop the ball (like reported here https://bugzilla.mozilla.org/show_bug.cgi?id=1563573). Brenda has identified about four more areas that haven't resulted in compliance issues yet but where we are going to be implementing additional processes. I may post incidents just to share them with the community and what the process is to benefit other CAs.

Just to share how large of a project this has been, we have shut down successfully 246 externally sub CAs so far since we acquired Verizon.

I just shared a document that shows the process we follow in tracking and following incidents. It's normally a confluence document. We've updated it based on some of the dropped bugs (ABB) and how we can do a better job of ensuring all open bugs are updated weekly. The process we have is the same as before except we added that we review all open bugs during our Thursday meeting and we ensure that we respond to each bug during that meeting that hasn't been updated. This way we get the response out while we are thinking about it and working through the issues. The one risk in this is still that we need more data and there isn't a good follow up from an assignment, meaning we could slip an update by a week. However, we did assign a specific person going forward (the IM) to track updates to bugzilla and ensure they are updated weekly. This person was already assigned the task of tracking bugs internally. Now they track them externally as well.

Jeremy: Thanks for that update. I think one thing to draw particular attention to is the need to be careful with how you measure/quantify whether or not a bug was updated. For example, a mistake that even I've run in to is that the Bugzilla "Last Updated" field is not equivalent to when someone Last Commented on an issue. As a consequence, if someone were to, say, add themselves to the CC list, Bugzilla may register that as an update and thus reflect that in "Last Updated", even if there is still a response pending from DigiCert.

The process I use when reviewing the Incident Dashboard thus has to take into account all of the open-bugs and ensure they're progressing in a timely fashion. For a single CA, reviewing all outstanding bugs assigned to DigiCert on a weekly basis, as a manual review, is a good way to make sure all updates are meaningful.

In terms of other steps, note that under your Bugzilla Profile, under Settings/User Preferences, you can watch entire components. For example, watching the NSS Product, CA Certificate Compliance Component, can ensure you will receive notifications for every bit of activity for all CA compliance bugs, regardless of the CA. It would seem that your Product Compliance process would benefit from not just reviewing outstanding issues for your CA, but proactively monitoring issues that other CAs face, to determine if there may be similar issues, or to re-evaluate assumptions or expectations.

We are going to measure it as either Brenda or I posted (or one of our direct reports) posted a status update in the last week. Hopefully going forward the number of incidents will be small enough that we can review the zero incidents in zero minutes. :)

I would like to elaborate on our Third Party Risk Management Program to further describe how we manage our external subCAs for full transparency given the subject of this bugzilla regarding failure to supervise a subordinate. If posting this on Bugzilla is not the appropriate channel, I’d appreciate counsel on this.

In January 2018, a discussion was conducted with Exec team at DigiCert and our Compliance team on the strategy to manage external subCAs. The plan we put in place is to assign resources specifically focused on getting the external subCAs in alignment with our risk management plan to adhere to policy and operational practice. We initiated weekly reporting and meetings with the executive team on the status of each and any re-alignment plans we were executing.

The immediate actions that were undertaken as part of this effort included:

  • Systematic revocation of Intermediate CAs, where appropriate. We have revoked a total of 246 intermediate CAs that were externally managed.
  • Distribution of the 3rd party CA / RA policy that enumerates the industry standards, policies and operational procedures we expect the external subCAs to adhere to, including conformity to Baseline Requirements and Mozilla Policy. This served as a reminder to these parties of the requirements they have to uphold in order to be in good standing and conduct business with DigCert.
  • We issued cease issuance letters for external SubCAs (including ABB) who were no longer issuing, and waiting for certificates to naturally expire or only processing OCSP/CRL requests
  • Instituted a bi-weekly Compliance status call to discuss each external subCA on audit standing, contract renewals and any incidents.
  • Established frequent communication with each external SubCA operator if their audit is coming due. We found challenges with third party audits that I will describe below what we did to mitigate the risk with updated policy and procedure. For reference, the challenges were further described in our incident report that we opened: https://bugzilla.mozilla.org/show_bug.cgi?id=1539296 (section 7).

Although we have had issues with timely reporting as of late on the ABB subCA, our intent is to revoke ABB intermediate CA 5 on 30 August 2019. We expect to proactively manage these relationships and incidents in a manner described within this incident report's attached document above: comment 9 (incident flow).

We are continuing to manage external subCAs that operate intermediates with the focus on managing with a risk-based approach and continuous monitoring.

Since the beginning of 2019, additional improvements and accomplishments we have made to our Third Party Risk Management program include:

  • Updating our policy to include a required scoping review of audits with DigiCert before they commence
  • Quarterly newsletters distributed to inform the external subCA operators of upcoming policy updates (e.g. Mozilla Policy 2.7, and revisions to our own CPS).
  • Annual onsite audits (rotation every other year for each of our legacy Symantec subCAs) from our Compliance team. Since the external subCAs’ annual WebTrust or ETSI audits cover certain scope such as Baseline Requirements, we expect (and require) that our external subCA operators go above and beyond when it comes to managing and securing their CA.
    The scope of our review includes:
    o Conformance to security standards that have been agreed to such as security camera configuration, physical security and software patching.
    o Review of Mozilla policy adherence and discussion of future requirements such as MozPol 2.7.
  • Another notable accomplishment is that we have also fully shut down 6 external SubCA operators, and 2 more underway (these totals include QuoVadis.) Only two of our external subCAs are left issuing TLS (CTJ, Microsoft).

We review on a semi-annual basis (and more frequently as issues arise) how we can improve our program. We welcome your constructive input on other considerations.

On the TLS-capable side, the following non-browser CAs still exist:
3 CAs operated by ABB - Revocation scheduled on Aug 30
8 CAs operated by CTJ - Working with CTJ to determine timeline
14 CAs operated by Siemens - Verizon CAs have stopped issuing; Quovadis working on a plan
1 CA operated by T-Systems - No longer issuing. Waiting for the certs to expire (Jan 4, 2020)
2 CAs operated by Verizon - No longer issuing. Last cert expires in 2021. Audit was posted today.
3 CAs operated by Darkmatter - Being distrusted by browsers

Brenda and Jeremy: thank you for the information provided in comments 13 and 14. It certainly helps to address the subject of this bug. I'll await confirmation that the ABB CAs have been revoked.

Updated information - the following still exist for TLS:
3 CAs operated by ABB - Revocation scheduled on Aug 30
8 CAs operated by CTJ - Cease issuance on April 1, 2020. Revocation proposed on Sept 1, 2021.
14 CAs operated by Siemens - Verizon CAs have stopped issuing; Need update from Quovadis
1 CA operated by T-Systems - No longer issuing. Waiting for the certs to expire (Jan 4, 2020)
2 CAs operated by Verizon - No longer issuing. Working on audit remediation. Will revoke end of March 2020.

You need to log in before you can comment on or make changes to this bug.