(In reply to Brenda Bernal from comment #0)
We have gathered the feedback so far from all impacted customers and filing this bug for a delay in revocation. We wanted to file this early given the reasons for the delay and to get the thoughts of the browsers about the additional compliance issue (of failing to revoke).
I mean, browsers can't and don't grant exceptions. Any failure to revoke would be a serious compliance issue that would impact trust in a CA going forward, even if it may not lead to immediate distrust. As we've seen from CAs that have a series of compliance issues, continuing to have compliance issues can ultimately result in loss of trust of the CA entirely.
For a majority of the customers, we are able to execute on the Saturday, July 11th deadline, but the infrastructure and use of certificates by others makes the ecosystem impact significant that we felt it would be appropriate to take another compliance hit by stating our case for the delay.
To be clear: This is not about "taking a hit". This is about demonstrating what makes this situation exceptional, which in this case is generally meant "unprecedented, new, or novel". This is touched on at https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation with the expectations, and I don't think this incident really rises to that.
As noted in our prior filed bug (1650910): DigiCert has halted all EV issuance across CAs omitted from an EV audit report. All EV issuance from these are on ICAs that will be covered in an EV audit.
I think there's concern about whether or not we can retroactively consider an EV audit having resolved this, so revocation is indeed the right answer. The response for dealing with such audit inconsistencies has, historically, been to revoke the entire intermediate and start fresh, due to the loss of trust and confidence. Revoking only the affected certificates appears to be a less-impactful option, and it's up to DigiCert to demonstrate that would sufficiently mitigate concerns, versus revoking the entire intermediate.
There are several that are asking for 45 days from the initial event. We have explained to these requests violate the industry requirements that we adhere to, but given the extraordinary situation of 2020, we thought we’d like to raise it here. Most of the 45 day requests involve a point of sales device, an ATM, or similar device that is using the WebPKI for non-web purposes.
The traditional response for dealing with such "non-web purposes" has been to revoke the intermediate via OneCRL. While Mozilla has seen this as an appropriate remediation, other browsers, including Google, have recognized that it fails to protect the broader ecosystem. While this may offer a viable alternative to revoking only the EV certificates, by revoking the intermediate in Firefox, I suspect that despite customer assertions, they are likely using them in the Web.
This specific use case is, as DigiCert knows, not one unfamiliar.
We recognize the need to separate non-web from web application and are exploring how to do that better. For now, we have recommended to customers using non-web devices that they replace their certificate with a separate root of trust.
This is not a new response for DigiCert, so I fail to see how these circumstances are exceptional. The same response was provided in the past - e.g. Bug 156561 - and DigiCert is no stranger to the concerns around exceptional events, such as those from Bug 1516453, Bug 1517617, Bug 1516599, or Bug 1516545. While DigiCert provided the experience in these past issues, I think the same response would be applicable regardless of the CA, as all CAs use these incident reports to better inform their policies and practices and ensure their Subscribers are appraised of the risks.
- More time to coordinate with third parties. Some of these are not operated by the requester or are distributed to third parties. Several payment and banking institutions require lead time to do testing, approval, and sign off. This reason is cited in the previous bug related to the underscore issue. Although we have mentioned this is essentially a SEV1 situation, the sign off process prevents a rapid replacement of installed certificates.
In the past, DigiCert had committed that this would not be an issue going forward. I'd be concerned if the steps DigiCert took a year ago, to move such customers off, as all publicly-trusted CAs were expected to do, were insufficient.
- COVID and ongoing lockdown restrictions. Accessing the data center and getting staff available at key locations has taken longer than expected simply because people are not available or are unable to travel at this time. A third of the certificates in this group are actively being used to monitor COVID patients and can’t have an outage.
In the past, DigiCert had committed that datacenter access issues were not going to be an issue going forward, by deploying and promoting automation solutions. It seems these Subscribers made an intentional choice not to adopt these practices to mitigate the risk?
- Regulatory bodies. Mobile banking applications, payment systems processing salaries, supplier payments, and online services run by financial institutions require some form of change process controlled by regulatory bodies. We are researching where these regulations may require more than 5 days. If this is true, then it would conflict with the policy and be something to present to the CAB forum as a conflict with local law. More likely the regulations just involve sufficient burden that it makes 5-day replacements impractical rather than impossible. Changing a cert on a mobile browser does require significant effort to have the update pushed to end users.
I think browsers would take a particularly dim view of such an approach. I think a CA that either failed to do their due dilligence in ascertaining whether customers posed such risk, or actively courted such customers, would be an existential risk to the continued trust of that CA. The reason is that the same reasoning could be seeing as actively promoting the issuance of MITM certificates, which is also explicitly forbidden, by attempting to use "local law" as a justification.
The clauses with respect to 9.16.3 are meant to be exceptional, and disclosed beforehand, which is explicitly noted as a SHALL requirement. By issuing the certificate, the sevrability clause was not exercised.
- US Government operations. Several of these certificates are key to the operations of the US government. These require approval. Although they have not stated that COVID is delaying approval, the approval process looks like it is taking longer than the five days.
- Pinned applications. A majority of the delayed revocation relates to pinned certificates where they need to figure out how to change the pinning of the cert. These are the ones we’d like to revoke on July 30. They have mentioned the pins don’t expire by then. We are working with them to see what we can do to still revoke within July.
This is unacceptable. There has been sufficient communication, to CAs, about the risk of pinning. CAs cannot be seen as responsible for their Subscribers misusing that CAs certificates, nor can the community be seen as a having to bear that risk.
- Tax season impact. With tax season deadline this year delayed, several tax organizations are on a freeze until July 15th.
DigiCert, and the CAs they acquired, such as Symantec, have used this explanation in the past. This would be a serious regression, and if it turned out these customers were previous Symantec customers, this would pose significant risk to the continued trust of DigiCert, who committed to the broader community that the set of issues that ultimately lead to the loss of trust in Symantec, some of which were intentional choices by Symantec to prioritize customers over behaving in a trustworthy fashion, would not similarly plague DigiCert.
We’re hoping for a response from the browsers about the next steps and their thoughts on the delay. We will continue to provide updates and working to still accelerate revocation and replacement over the next few days. As part of this remediation, we would like to take an active role in separating out non-web PKI from the web and are looking for recommendations on how to better accomplish that task. We are also promoting use of automation as a key part of any certificate solution, including investing more in improving our ACME and other certificate automation tools.
DigiCert already committed to this, a year ago. I think it's disheartening to think that there has been no progress. I understand that the relationship between DigiCert and its customers is complex, but it's precisely because these were things DigiCert already committed to in light of past incidents that it should have been able to highlight to current customers that there could be no future incidents without impacting DigiCert's trust.
I realize that customers of DigiCert will be directed to this, and perhaps chime in on why it's unreasonable to have expectations at all, and shouldn't everything be treated bespoke. We have requirements to treat CAs consistently, fairly, and to have an objective standard of trustworthiness. There is nothing inherent to a CA that prevents these obligations from being met: they are design trade-offs a CA knowingly makes. These expectations have been stable for years, and CAs that have struggled to meet the expectation in the past, such as DigiCert, have been given time to design systems to better align. Certificates that cannot be replaced within the time specified, by their very existence, pose a serious threat to the ability to respond. Just like we don't purchase fire insurance once our house is on fire, and expect everything to be OK, we don't wait to implement good practices until the serious security incident is here. The ecosystem has already faced a number of significant security events, from DigiNotar to Heartbleed to Symantec's distrust, and each of these reveal inadequacies in the system that CAs are supposed to be working to correct.
The choice on whether to revoke or not is, ultimately, DigiCert's, and that's called out in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation . But I think DigiCert is very aware of the serious concerns that have come up, both with their CA and the broader trend of CAs that have attempted to have "the ecosystem" pick up the tab for CA's and customers decisions, and we have to move past that. When customers make a decision to use a publicly trusted CA, that CA is expected to explain their policies, which they do, as part of the Subscriber Agreement and CP/CPS. If the customer still uses the product, in a way that it's not intended for nor safe to use, it cannot be seen as "someone else's fault" when things go awry.
The policies that ensure prompt revocation keep all sites safe. They're the same policies that help protect your sites from hackers or abuse, by ensuring that if there were certificates you did not authorize, they are promptly revoked. The reason we expect consistency, regardless of severity, is precisely because the moment "severity" becomes an area for determining policy, we encourage CAs to adopt selectively-lax policies, which increases the risk, for everyone, of more high-severity incidents. This isn't theory, it's the exact system browsers have been trying to fix for the past 15 years, after the well-published "PKI race to the bottom" that ultimately resulted in DigiNotar: a series of "minor" issues that created an opportunity for a major, company-ending issue.
If DigiCert decides not to revoke, the burden rests on demonstrating why this is exceptional. Analyzing past incident reports, from DigiCert and other CAs, to see whether similar issues were encountered or similar commitments were made, is essential. This is because DigiCert will need to demonstrate, per-Subscriber, why the situation is exceptional, and have a comprehensive plan to prevent this underlying issue from happening again. Not ideas, but a timeline with concrete plans. For those customers that need additional time, it'll be important to identify specifically who they are and what steps they're taking, in order to ensure they don't simply shift to a new CA and cause the same compliance issues at a new CA.
I realize COVID-19 is truly exceptional here, and this understandably creates a host of challenges. The only information to determine whether something is reasonable or not is based on what's provided in these incidents, the analysis and comparison to past incidents, and the commitments going forward. I remain deeply concerned on this issue, with the information available, because it feels like either "We don't know how to solve this" or "We aren't willing to solve this and hoping browsers will be the 'bad guy' on this" or, worse, "our customers don't understand how harmful to the security of all users what they're asking for is". I suspect that, if there is any delay, specific commitments from those customers, which can help better improve the security for all users, would be a minimum bar to expect.