Sectigo: DCV Reuse after 825 days
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: rob, Assigned: rob)
References
Details
(Whiteboard: [ca-compliance] [ev-misissuance] [ov-misissuance] [dv-misissuance] )
Attachments
(2 files)
1. How your CA first became aware of the problem.
In bug 1694233 comment 15 and bug 1694233 comment 20, we announced that we would "conduct a full review of our DCV components, software and process" and we outlined the phases in our action plan for this project. The first phase (IDENTIFY ACTIVE DCV METHODS) uncovered deficiencies in our CPS (see bug 1714628), but the second phase (ANALYZE ACTIVE DCV METHODS) was barely underway when we became aware of the issues explained in bug 1712188. Whilst considering the revelations in that bug, we realized that in addition to reviewing all of our DCV code it would be hugely beneficial to kick off a parallel effort to exhaustively review our "DCV data".
We wrote a script to perform a compliance audit of the DCV method used by every SAN dNSName and ipAddress in our corpus of unexpired, publicly trusted server certificates. This is the same "custom script to find affected certificates" that was "a large query that must run for several days" that Tim referred to in bug 1714628 comment 4. We expected this script to uncover every occasion that we'd relied on "manual DCV" (see bug 1718579), or relied on a DCV method that was missing from our CPS (see bug 1714628); and since "Re-use of DCV" was also in scope for our DCV Review project (see bug 1694233 comment 20), this script also audited the elapsed time between each DCV check and the issuance of corresponding certificate(s). We were confident that this script wouldn't miss any anomalies, but due to its complexity we thought it quite likely that the first run of the script would yield false positives. Consequently, we anticipated that for a subset of the results of the first scan we would not be able to immediately reach a determination about whether or not revocation was necessary. We planned to iteratively review the results of the scan, then tweak the script as necessary to deal with any groups of false positives identified by that review, and then run the modified script to produce another set of results.
Early results from the first run of the script suggested that we might have one or more bugs relating to DCV Reuse outside the maximum reuse period of 825 days permitted by the BRs, and so determining whether or not these were false positives was one of the things that our iterative review looked at.
2. A timeline of the actions your CA took in response.
2011-11-10: Recent DCV Reuse code is deployed.
2016-04-12: Same Public Key DCV Reuse code is deployed.
2021-05-24: R&D begins implementation of the audit script.
2021-05-27: Audit script v1 starts running.
2021-06-06: Audit script v1 finishes running.
2021-06-07: The v1 results are sent to the Compliance team, which confirms a subset for revocation (see bug 1714628) and sends the remainder to a group of R&D and Validation staff for further analysis.
2021-06-10: Several groups of false positives are identified, and R&D commences tweaking of the audit script accordingly.
2021-06-11: R&D concludes that our initial hypothesis (that we might have bugs relating to DCV Reuse) is likely to be correct, and begins to review our DCV Reuse code.
2021-06-14: R&D identifies probable bugs in our DCV Reuse code and begins implementation of bugfixes.
2021-06-14: Audit script v2 starts running.
2021-06-14: In bug 1712188 comment 9 we announce that our various efforts to review and update our DCV code "are for all purposes now a single project". (This incident report should clarify for readers how integrated these efforts had already become at this point).
2021-06-21: Audit script v2 finishes running.
2021-06-22: The v2 results are sent to the same group of Sectigo R&D and Validation staff for further analysis.
2021-06-22: A candidate bugfix code release is committed to our version control repository and deployed to our QA system. The QA team starts testing this code release.
2021-06-23: QA identifies a regression. A fix is developed, committed, and deployed to our QA system.
2021-06-24: QA identifies another regression. A fix is developed, committed, and deployed to our QA system.
2021-06-25: QA completes successfully, and Release Managers confirm the final code release for our next Production deployment window.
2021-06-26: R&D implement some further minor tweaks to the audit script.
2021-06-27: The final code release is deployed to Production. Immediately afterwards, audit script v3 starts running.
3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.
Our Production code release on 2021-06-27 comprehensively fixed all known problems with our DCV Reuse mechanisms. Consequently, we are no longer issuing certificates that rely on outdated proofs of domain control.
4 & 5. A summary of, and the complete certificate data for, the problematic certificates.
Audit script v3 will identify the full list of potentially problematic certificates that were issued prior to deployment of the code fix. We expect that, as before, it will take several days to run. Upon its completion, we will isolate the known problematic certificates from those that will require another round of closer examination. We will then provide a summary and crt.sh links for the known problematic certificates, and we will revoke all of them within the BR-mandated time period. Closer examination of the potentially problematic certificates will be a high priority for us but may take a few days.
6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
BR 4.2.1 says:
"The CA MAY use the documents and data provided in Section 3.2 to verify certificate information, or may reuse previous validations themselves, provided that the CA obtained the data or document from a source specified under Section 3.2 or completed the validation itself no more than 825 days prior to issuing the Certificate."
Our Policy Authority asked our R&D department to implement two mechanisms for DCV Reuse:
- Recent DCV Reuse (added 2011): When a customer requests a replacement certificate, the control of any domains for which that customer's account successfully completed DCV within the previous 1 week does not need to be revalidated. This mechanism was added shortly before CABForum adopted the first version of the BRs, so at that time the rules around DCV Reuse weren't as clear as they are today. The Policy Authority took the view that a short DCV reuse period would be acceptable, having been urged by other departments that permitting this would really help with some customer usage patterns.
- Same Public Key DCV Reuse (added 2016): When a customer requests a replacement certificate and provides either the same CSR as before or another CSR that contains the same public key, the control of any domains (that are common to the to-be-replaced certificate and the replacement certificate request) for which that customer's account successfully completed DCV within the previous 825 days does not need to be revalidated.
In bug 1712188 comment 16, Tim wrote about how "we did not effectively indoctrinate all our new employees with a detailed understanding of the accepted practices of a public CA" since the split from Comodo, and he described a large proportion of these new employees as "PKI outsiders". Although amplified by the rate of Sectigo's growth, this problem actually has roots before that corporate split. Of particular relevance to this bug is that some R&D staff were "PKI outsiders"; and although these developers usually worked on peripheral tasks such as account management features and UI, there were occasions when they were asked to write code that required a working knowledge of at least a subset of CABForum documents and browser root program policies.
The same developer, who I class as a "PKI outsider", was tasked with implementing both of the DCV Reuse mechanisms described above. On both occasions, the code suffered from a couple of poor design choices:
- The code relied on a static block-list of disallowed DCV methods when verifying that a successful DCV check had occurred within the previous 1 week or 825 days.
- The code recorded instances of DCV Reuse using the same database fields that were already recording CABForum-compliant DCV checks.
Code review at the time did not flag these as areas of concern. Nobody involved documented the need to ensure that we always kept the block-list up-to-date (I suspect that nobody foresaw the fairly recent proliferation of BR 3.2.2.4 subsections!) or identified the conflation of DCV records and DCV Reuse records as a footgun.
An ad-hoc review of a small sample of the preliminary results from our audit script found a number of cases of DCV Reuse where the script had been unable to locate any record of a successful DCV check that had occurred within 825 days of certificate issuance, which prompted us to review the code behind those mechanisms. We found the same bug in both mechanisms: When we skip DCV revalidation due to "Recent DCV Reuse" or "Same Public Key DCV Reuse", we record this fact; however, due to the block-list be(com)ing incomplete, both mechanisms were (incorrectly) treating records of DCV Reuse as if they were records of CABForum-compliant DCV. Although we often limit to well below 825 days the time period during which a customer can continue to obtain certificate replacements before they have to place a fresh order, our code review confirmed that, due to this buggy behaviour, it would have been possible for replacement certificates that relied on DCV checks performed more than 825 days earlier to have been issued.
After the original implementation and before implementation of the code fix described in this incident report, the code for these two mechanisms had only been touched or looked at by other "PKI outsiders", so the buggy behaviour went undetected by R&D. Customers and our QA department did not spot it either: since we require a DCV method to be selected for each SAN entry when a fresh order is placed or when a certificate replacement is requested, it would have been quite reasonable for a certificate requester to assume that we had performed a fresh and successful DCV check using the DCV method that they had selected. Internal audit and our previous external auditors did not detect the problem in any random samples they reviewed, although our current auditors (who are new to us for the current audit cycle) independently stumbled upon one example of the problem in a random sample of issued certificates that we provided to them around the same time that we were reviewing the initial results of the audit script. Until our audit script revealed otherwise, our Compliance team and Policy Authority had no reason to suspect that these DCV Reuse mechanisms did not conform to the original requirements that the Policy Authority had specified.
7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
We have deployed a comprehensive code fix to our DCV Reuse mechanisms that has replaced the static block-lists with a new static allow-list of DCV methods that are both BR-compliant and compliant with our CPS. This will cause the possibility of non-compliant DCV Reuse to fail closed rather than fail open.
We have a ticket open to implement a new "DCV confirmation" mechanism for recording which BR-compliant DCV method was used to validate a SAN entry, and when. This mechanism will supplement the existing mechanism we have for recording this information, but without conflating DCV Reuse with BR-compliant DCV methods. Additionally, we will update our certificate issuance application to double-check that a "DCV confirmation" record exists for every SAN entry prior to issuance. This belt-and-suspenders approach will be useful for at least two reasons: (1) if there are any further not-yet-discovered bugs in our vintage code relating to DCV Reuse or not-permitted DCV methods, this new mechanism should both uncover them and prevent issuance; and (2) we intend to implement the "DCV confirmation" records in a manner that is better indexed and thus more easily searchable than our existing DCV check logs, which should mean that any future analysis of the kind described in this bug won't rely on a script that takes many days to run.
We will continue with the DCV Review project that we announced in bug 1694233. The planned methodical code review and documentation effort will further solidify our confidence in our code that pertains to DCV checking.
Bug 1712188 comment 16 (EMPLOYEE EDUCATION AND CULTURE) and bug 1714628 comment 9 explain how we seek to address the "PKI outsiders" issue. We recognize that process training as a substitute for programmatic controls is not adequate; but nonetheless, training of the people who must create programmatic controls is essential.
We have some concerns that DCV Reuse to the fullest extent permitted by the BRs may not provide adequate proof of domain control in some scenarios. Within the next few days I will post a comment to this bug to describe these concerns and explain how they affected the design of our DCV Reuse mechanisms.
Comment 1•4 years ago
|
||
Our investigation of affected certificates continues. This is an exceedingly complex investigation for several reasons.
-
Our record of how DCV was recorded for a domain in any individual certificate does not necessarily reflect the ONLY action that took place. At the time we issued any given certificate, it might be that our systems found an (incorrect) instance of DCV reuse and marked our records that way. However, it’s possible we nonetheless had performed compliant DCV for this domain within the previous 825 days and that our simple data query is not returning that result. More sophisticated queries can reveal these false positives.
-
There is no simple one-to-one relationship between domains and certificates. Many domains appear in more than one certificate, and many certificates contain more than one SAN (and sometimes dozens or even hundreds of them).
-
Subscribers, especially those in the hosting business, tend to swap domains between certificates. One of the consequences of 90-day certificates, which in general we regard as a good thing, is that the mix of SANs can quickly change as the hoster’s automated systems shuffle customers between hardware to optimize resource use. This further complicates the task of crafting a data query that will reveal all relevant information for every certificate.
These factors combine to create a tangled web of relationships that we need to unsnarl before we can fully account for the DCV methods performed on each SAN in the available time period. We mentioned in comment 0 that we have had to fine tune our query on multiple occasions, adding complexity to eliminate false positives. We are still doing so.
This process already has shown a very significant outcome. To put things in perspective, the result set of the most recent query is 0.2% the size of the result set from the first query we ran. All those eliminated certificates were false positives. We have learned by looking at results, identifying false positives in the set, and modifying the query to capture them. We are digging into the results of the latest query and still expect to find additional false positives before we are done.
Therefore we have not yet identified any known problematic certificates from the last query. We anticipate being able to isolate individual tranches of certificates for which we’ve confirmed misissuance, and we intend to revoke and report them as these groups become available, rather than waiting for some massive research project to be completed. The batch of certificates we reported in bug 1714628 comment 4 is an example of that. We feel this is the best way to remove noncompliant certificates as expediently as possible, even though it makes reporting more complicated.
We expect this total process to take weeks, not months, and of course we will report our progress along the way.
| Assignee | ||
Comment 2•4 years ago
|
||
(In reply to Rob Stradling from comment #0)
We have some concerns that DCV Reuse to the fullest extent permitted by the BRs may not provide adequate proof of domain control in some scenarios. Within the next few days I will post a comment to this bug to describe these concerns and explain how they affected the design of our DCV Reuse mechanisms.
In https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/IQBCs7Ex-wo/m/PcR-pep8BQAJ, Ryan Sleevi wrote:
"...but it's worth noting, there's no strict binding between the domain holder and the Applicant/Subscriber. The Applicant/Subscriber is just the entity who requests the certificates and can demonstrate authorization from the domain holder."
We feel that this "no strict binding" deserves careful analysis. Let's consider several Applicant/Subscriber scenarios:
-
The Applicant/Subscriber is the domain holder. Since only the CA and the domain holder are involved in the DCV process, the CA can have complete confidence that a subsequent certificate request that is received on the same customer account has been initiated by the same domain holder. DCV Reuse is safe in this scenario.
-
The Applicant/Subscriber is a CDN or hosting provider that can complete DCV challenges on behalf of the domain holder. Since the CDN or hosting provider has a duty to ensure that they correctly handle delegated domain control (which includes keeping customers' sites isolated from each other and ensuring that a certificate request is only submitted to the CA on behalf of the correct customer), the CA can have a high degree of confidence that a subsequent certificate request received on the CDN or hosting provider's account has been authorized by the domain holder. DCV Reuse is safe in this scenario, as long as the CDN or hosting provider is doing correctly what's expected of them.
-
The Applicant/Subscriber is a certificate reseller that cannot complete DCV challenges on behalf of the domain holder. Since domain control has not been delegated, the certificate reseller has to ask the domain holder to complete the initial DCV challenge. Unlike a CDN or hosting provider, a certificate reseller is not expected to consider domain control before submitting a certificate request to the CA on behalf of a customer. DCV Reuse is, in our view, profoundly unsafe in this scenario. Consider this near-real-world example:
- Company A is a service provider. They have many customers. In this context they do NOT provide DNS or hosting or email services, but do offer certificates to their customers, often bundled with other services.
- Two of Company A's customers are Coke and Pepsi.
- Coke requests a certificate for coke.com via Company A. The CA receives the request, and a DCV challenge is completed successfully.
- Pepsi requests a certificate for pepsi.com via Company A. The CA receives the request, and a DCV challenge is completed successfully.
- If the CA permits DCV Reuse, Company A (who the CA considers to be the Applicant/Subscriber) now has the ability to request further certificates for coke.com and pepsi.com without any further DCV checks being required prior to issuance.
- Given that Company A is not expected to consider domain control when its customers request certificates, it appears that there would be nothing to stop Coke from obtaining a certificate for pepsi.com or to stop Pepsi from obtaining a certificate for coke.com. This is because, as far as the CA is concerned, Company A is the Applicant/Subscriber and Company A has demonstrated control of both domains; and therefore, reuse of the original DCV evidence for both domains is permitted for 825 days.
Our Same Public Key DCV Reuse mechanism is not exposed to the risk described above. Coke wouldn't have a copy of Pepsi's private key, and vice-versa.
Our Recent DCV Reuse mechanism mitigates the risk by limiting the reuse period to 7 days, which is orders of magnitude shorter than the 825 days currently permitted by the BRs.
For completeness, I'll also mention that we have a third DCV Reuse mechanism that we call Sticky DCV. We usually limit the reuse period for Sticky DCV to 365 days. Whereas our (7-day) Recent DCV Reuse mechanism automatically works for any domain, Sticky DCV only works when a member of our Validation staff has preauthorized an Authorization Domain Name to a particular customer account. In nearly all cases, we only enable Sticky DCV for enterprise customers that use our Sectigo Certificate Manager platform and that fall under scenario 1 (i.e., the Applicant/Subscriber is the domain holder).
Comment 3•4 years ago
|
||
Thanks for sharing the analysis, Rob. Yes, this is part of why browsers have been keen to discuss reductions in the domain reuse period; recall that in the past, representatives of Google and Mozilla have suggested in the CA/Browser Forum that the target be something much shorter (e.g. 30 days or less, with some suggestions at one day or less). The reuse of DCV fundamentally causes mismatches with the DNS.
(In reply to Rob Stradling from comment #2)
- The Applicant/Subscriber is a CDN or hosting provider that can complete DCV challenges on behalf of the domain holder. Since the CDN or hosting provider has a duty to ensure that they correctly handle delegated domain control (which includes keeping customers' sites isolated from each other and ensuring that a certificate request is only submitted to the CA on behalf of the correct customer), the CA can have a high degree of confidence that a subsequent certificate request received on the CDN or hosting provider's account has been authorized by the domain holder. DCV Reuse is safe in this scenario, as long as the CDN or hosting provider is doing correctly what's expected of them.
With respect to these scenarios 2 and 3, I'd like to suggest that they are, in effect, the same from the CAs' perspective. Whether one is dealing with a CDN or a Reseller is not typically one of a technical difference, and in both cases, the CA is relying on and assuming the intermediate (the CDN or Reseller) is appropriately segmenting out access controls.
You pose the hypothetical of a reseller account used to obtain certificates for Coke and Pepsi, as if that's a distinction from the cloud provider. However, as the issues with ACME's TLS-SNI-01 show, this situation equally exists for cloud providers. A number of cloud providers would allow, say, Pepsi to claim the coke.example domain name, but only if it was not on the IPs assigned to Coke's account for coke.example. Put differently: the cloud providers provided segmentation not based on the domain names their customers asserted, but based on the IP addresses assigned to the load balancers of different customers.
While Scenario 1 (Applicant == Subscriber) does reduce risks of confused deputy at the intermediate, it also doesn't eliminate them. Consider someone who sells or transfers their domain name, but fails to invalidate/transfer their Sectigo account. The original owner would continue to be able to obtain certificates (same key or different key), contrary to the intent.
To that end, it does seem useful to treat all of these scenarios as stepping stones towards further reduction in DCV, and working to identify (and bring transparency to) the challenges customers face. In particular, it's worth exploring if there are methods of DCV that can promote more regular DCV checks at issuance time, while minimizing the amount of configuration necessary for Subscribers.
Comment 4•4 years ago
|
||
Yesterday, 7/13/21, we revoked 96,002 certificates with DCV reuse beyond 825 days. We issued them between January 4, 2021 and June 26, 2021.
These certificates are all in the possession of a single, large subscriber. While the number of certificates revoked is more than 95,000, the number of actual registrable domains in this batch with DCV reuse beyond 825 days is only 1365.
I’ve attached the full list of certificates in the file bug1718771_revocations_20210713.
Comment 5•4 years ago
|
||
Comment 6•4 years ago
|
||
Comment 7•4 years ago
|
||
Yesterday, July 14, we revoked 160 certificates from a variety of subscribers for DCV reuse beyond 825 days, issued between 5/29/2019 and 6/23/2021. They are included in the attachment bug1718771_revocations_20210714.
| Assignee | ||
Comment 8•4 years ago
|
||
(In reply to Ryan Sleevi from comment #3)
Thanks for sharing the analysis, Rob. Yes, this is part of why browsers have been keen to discuss reductions in the domain reuse period; recall that in the past, representatives of Google and Mozilla have suggested in the CA/Browser Forum that the target be something much shorter (e.g. 30 days or less, with some suggestions at one day or less). The reuse of DCV fundamentally causes mismatches with the DNS.
Thanks Ryan. I'm glad to hear that we're on roughly the same page.
With respect to these scenarios 2 and 3, I'd like to suggest that they are, in effect, the same from the CAs' perspective. Whether one is dealing with a CDN or a Reseller is not typically one of a technical difference, and in both cases, the CA is relying on and assuming the intermediate (the CDN or Reseller) is appropriately segmenting out access controls.
Yes, absolutely. My "DCV reuse is safe...as long as..." caveat on scenario 2 was not intended to imply no risk. Scenario 3 seems riskier than scenario 2, but both scenarios seem to flout the spirit of BR 1.3.2's implied (i.e., Default Deny) requirement that DCV mustn't be delegated to a Delegated Third Party.
You pose the hypothetical of a reseller account used to obtain certificates for Coke and Pepsi, as if that's a distinction from the cloud provider. However, as the issues with ACME's TLS-SNI-01 show, this situation equally exists for cloud providers. A number of cloud providers would allow, say, Pepsi to claim the
coke.exampledomain name, but only if it was not on the IPs assigned to Coke's account forcoke.example. Put differently: the cloud providers provided segmentation not based on the domain names their customers asserted, but based on the IP addresses assigned to the load balancers of different customers.
Good point. I'm not aware of any cloud providers that are Sectigo customers, so I hadn't considered this use case.
While Scenario 1 (Applicant == Subscriber) does reduce risks of confused deputy at the intermediate, it also doesn't eliminate them. Consider someone who sells or transfers their domain name, but fails to invalidate/transfer their Sectigo account. The original owner would continue to be able to obtain certificates (same key or different key), contrary to the intent.
True, and this is one reason why our Sticky DCV reuse mechanism is generally limited to enterprise customer accounts.
To that end, it does seem useful to treat all of these scenarios as stepping stones towards further reduction in DCV, and working to identify (and bring transparency to) the challenges customers face. In particular, it's worth exploring if there are methods of DCV that can promote more regular DCV checks at issuance time, while minimizing the amount of configuration necessary for Subscribers.
Sectigo's current view is that the next stepping stone for the BRs should be to forbid (or drastically reduce) DCV Reuse for at least scenario 3 and perhaps also scenario 2, whilst continuing to permit DCV Reuse (for 825 days, soon to be reduced to 398 days) for at least Enterprise RAs. In our experience, our Enterprise RA customers are amongst the least agile in terms of adopting PKI automation, but benefit the most from DCV Reuse, whilst at the same time being the least likely of our customers to sell or transfer their domain name(s) without our knowledge.
We intend to wait for the m.d.s.p discussion (https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/5gNYSvbA57Q/m/7k2gZpIfAwAJ) to run its course, and then after that we will consider proposing a change to the BRs.
2021-06-27: The final code release is deployed to Production. Immediately afterwards, audit script v3 starts running.
Tim, do you have more information on the status and (intermediate) results of this audit script v3? This script was spoken of as an integral part of the discovery of problematic certificates in the initial report (section 4&5), but the result of the script hasn't been mentioned (as such) since.
Comment 10•4 years ago
|
||
On July 16 we revoked two certificates - issued September 23, 2019 and October 23, 2020 - for DCV reuse beyond 825 days.
Comment 11•4 years ago
|
||
(In reply to Matthias from comment #9)
Tim, do you have more information on the status and (intermediate) results of this audit script v3?
We have been processing the results of that script and performing revocations, which we've reported in comment 4, comment 7, and comment 10. We are nearing completion of this task.
Comment 12•4 years ago
|
||
On July 20 we reported the revocation of 11 Manual DCV misissuances in bug 1718579 comment 3. This represents the conclusion of our DCV misissuance research and we have no additional revocations pending.
Comment 13•4 years ago
|
||
Are there any questions or comments on this issue?
Comment 14•4 years ago
|
||
We have been monitoring this bug for questions and feedback, and the community appears to have said its peace. Ben, should this bug be closed?
Updated•4 years ago
|
Comment 15•4 years ago
|
||
I will close this bug on next Wed. 11-Aug-2021 unless additional questions are raised.
Updated•4 years ago
|
Updated•3 years ago
|
Updated•2 years ago
|
Description
•