Closed Bug 1741777 Opened 3 years ago Closed 2 years ago

Sectigo: OCSP responses directly signed using root certificates without KU=digitalSignature

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: tim.callan, Assigned: rob)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

1. How your CA first became aware of the problem

On August 8, 2021 bug 1725039 appeared regarding direct OCSP signing by Network Solutions using a root certificate that lacks the digitalSignature Key Usage. As this root predates the publication of the BRs, we at that time were of the opinion that it was 100% in compliance with the Baseline Requirements and advised Network Solutions as such. We also engaged in that thread to clarify our opinion.

After some dialog on this topic, bug 1725039 comment 10 declared that the Google Chrome Root Authority Program considered this practice to be noncompliant with the BRs. Sectigo has the same situation as Network Solutions in that we have been directly signing OCSP responses with roots that predate the BRs and do not have the digitalSignature bit enabled in the Key Usage extension As root programs are able to issue judgements on matters like this one for CAs to follow, we now have a need to update our affected root certificates and report the issue here.

2. Timeline

January 1, 2004 (AAA Certificate Services)
December 6, 2006 (COMODO Certification Authority)
March 6, 2008 (COMODO ECC Certification Authority)
February 15, 2010 (COMODO RSA Certification Authority)
February 16, 2010 (USERTrust RSA Certification Authority, USERTrust ECC Certification Authority)
July 28, 2011 (COMODO Certification Authority, reissued)
Comodo creates root certificates intended for issuing certificates and CRLs and for directly signing OCSP responses, prior to publication of the BRs. The digitalSignature Key Usage bit is omitted, per our understanding of the guidance in the available RFCs at those times.

November 22, 2011
Version 1 of the Baseline Requirements adopted.

August 10, 2021, 15:29 PDT
Bug 1725039 appears.

August 11, 12:34 PDT
We offer Network Solutions the advice that bug 1725039 is invalid for reasons expressed in bug 1725039 comment 4, bug 1725039 comment 6, and bug 1725039 comment 8.

September 3, 16:02 PDT
In bug 1725039 comment 8 we post an extensive explanation of our understanding of the consistent practices regarding the use of pre-existing roots in conjunction with subsequent CABF or root store requirements.

September 9, 12:24 PDT
The Chrome Root Authority Program responds in bug 1725039 comment 10 making it clear that it considers such roots inappropriate for directly signing OCSP responses.

October 14, 08:30 PDT
The CABF face to face features an agenda item to discuss use of roots and certificates that existed prior to new and updated requirements.

October 26, 08:00 PDT
Sectigo holds an internal meeting of subject matter experts to evaluate the material relevant to this matter and our response to it, including bug 1725039, the results of the recent CABF face-to-face discussion, and our own root usage. Due to a key participant’s PTO this is the earliest we can conduct this session. We conclude that we should update our affected roots to include the digitalSignature Key Usage bit and kick off an internal project to that end.

October 26 to November 3
We socialize this plan internally.

November 3, 07:00 PDT
This plan is presented at Sectigo’s regularly scheduled management meeting and buy-in occurs.

3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem.

This is not a matter of certificate misissuance.

4. Summary of the problematic certificates

This is not a matter of certificate misissuance.

5. Affected certificates

This is not a matter of certificate misissuance.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now

Comodo created multiple root certificates before the BRs were created. After the BRs went into effect, pre-existing roots were honored in all kinds of instance including those detailed in bug 1725039 comment 8.

Additionally, we continued to directly sign OCSP responses without any concern expressed by any member of the community or root program representative from the inception of the BRs until late 2021. During that time both we and Network Solutions passed annual WebTrust audits with our auditors having full visibility on our roots and how we used them with no auditor ever expressing any concern about this practice. Now, however, a major browser manufacturer has told CAs not to directly sign OCSP responses with roots missing the digitalSignature bit, and so we are responding to this input.

For more detail on the history of the affected roots and the reasoning behind our decision-making, see bug 1725039 comment 8 and the IETF PKIX list posts mentioned in bug 1725039 comment 4.

7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future

We will create replacement root certificates for each root mentioned in the Timeline above, and we will submit them for inclusion in the major root programs.

Since the BRs first came into force, it has been our policy to include the digitalSignature Key Usage bit in all newly issued Root and Subordinate CA Certificates.

The proposed approach - generating new roots - seems like it introduces significant compatibility risk to clients, new and old alike, in dealing with path building with these new and old roots. Why not just issue and use delegated responders? What were the factors considered as part of the proposed path, and how were they weighed?

Assignee: bwilson → rob
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

The proposed approach - generating new roots -

Hi Ryan. Let me start by clarifying the nature of these ”new roots”.

Although Sectigo is preparing to submit root inclusion requests to the major browser/OS root programs for a new generation of purpose-specific root certificates, this is not what Tim was describing in comment 0.

Comment 0 stated our intention to “create replacement root certificates for each root” because we had resolved to “update our affected roots to include the digitalSignature Key Usage bit”. Each of these ”replacement root certificates” will share the same Issuer/Subject DN and Public Key as the existing root certificate that we intend it to replace.

seems like it introduces significant compatibility risk to clients, new and old alike, in dealing with path building with these new and old roots.

We don’t foresee any compatibility risk, based on our previous experience with replacing root certificates in this manner. There are already two ‘versions’ of the COMODO Certification Authority root certificate. The first is present in most root programs except for Microsoft’s; the second replaced the first in Microsoft’s root program back in 2011, in order to resolve a problem with CryptoAPI building the ‘wrong’ chain and EV status not being recognized. We are not aware of any compatibility problems relating to this set-up over the past 10 years.

We know from bug 1725039 comment 7 and from your comments on bug 1652581 that you’re well aware that your colleagues at GTS have recently gone through the exact same process (and indeed for the exact same reason - to add the digitalSignature Key Usage bit) of issuing a new set of root certificates that are almost identical to the root certificates they are intended to replace. These replacement GTS root certificates will be added to NSS in the ‘December 2021 Batch of Root Changes’, according to the Whiteboard on bug 1735407. Do you have a similar concern that what GTS are doing “introduces significant compatibility risk to clients”?

Why not just issue and use delegated responders? What were the factors considered as part of the proposed path, and how were they weighed?

We are heavily invested in non-delegated OCSP response signing, as I briefly described in the last paragraph of bug 1725039 comment 8. We recognized very early on that delegated OCSP response signing inflates the size of OCSP responses (IIRC they’re often ~4x larger than a non-delegated equivalent would be), and we were keen to make our OCSP service as performant as possible. We also did not want to take on the added complexity and burden of the certificate lifecycle management of delegated OCSP Signing certificates for each of the hundreds of subordinate CAs that we operate, nor were we keen to consume valuable HSM key storage space just for this purpose. These are the reasons why we were such an early adopter and were prepared to put in the effort to help fix OpenSSL and Firefox in order for non-delegated OCSP response signing to become viable in the WebPKI.

Before filing this bug we did discuss the alternative idea of putting in the engineering effort to enable our existing offline root CAs to use delegated OCSP response signing. However, we quickly concluded that this would be a backwards step that we do not want to take, for the same reasons that we adopted non-delegated OCSP response signing in the first place (smaller response size, more performant, less complex, no added certificate lifecycle management burden, no additional HSM-based keys needed) and also because our previous experience of replacing root certificates leads us to believe that our proposed approach will actually be a quicker, easier, and less risky project.

Flags: needinfo?(ryan.sleevi)

Do you have a similar concern that what GTS are doing “introduces significant compatibility risk to clients”?

Yes.

Let’s look at this two fold:

  • Unless the existing roots are revoked, there’s still an issue of BR-violating action occurring, because you’re using the key pair of the existing root to sign these responses. I realize the temptation will be, as it always is for a CA being told that what they’re doing is non compliant (as the consensus at the face to face was unambiguous to that conclusion) to suggest “change the BRs”, as we saw with forbidden practices like underscores.
  • If the existing roots are revoked, that will understandably create path building issues for clients that check revocation on roots, and be insufficient/no effect for clients that don’t.

The cost savings arguments are, equally, a little questionable, because this applies to responses generated by the roots - e.g. for intermediates. The BRs allow very long lifetimes for such responses, indicating high cachability by the clients. Similarly, a number of clients, most notably CryptoAPI, prefer CRLs vs OCSP for non-end-entity clients, not to mention a number of clients that use alternative means of revocation checking for intermediates (Valid, CRLLite, OneCRL, CRLSets)

I don’t believe the proposed approach actually remediates the compliance issue, precisely because of the same name/key signing means these are still OCSP responses issued by CAs without the digitalSignature bit set. They are also responses issued by doppelgängers with that bit set, but that is irrelevant to the violation.

As a practical matter, the plan outlined by Sectigo (and GTS) effectively prohibits clients from actually enforcing this constraint. As we saw with previous issues with OCSP responder certificates, clients enforcing constraints (such as Firefox does, with “must not have CA bit set), this can be critical to mitigating risks from CAs.

Hopefully that clarifies why I don’t believe this resolves the compliance issue, introduces unnecessary risks to clients, overlooks relevant factors that seem to materially affect the projected impact, and carries unnecessary risk should Sectigo take steps to resolve those (e.g. by revoking the roots, which in practice is not possible short of blocklisting those non-compliant certs)

Flags: needinfo?(ryan.sleevi)

I don't believe the proposed approach actually remediates the compliance issue, precisely because of the same name/key signing means these are still OCSP responses issued by CAs without the digitalSignature bit set. They are also responses issued by doppelgängers with that bit set, but that is irrelevant to the violation.

As it happens, we did consider this line of thinking before we proposed the root replacement plan, and we have considered it again in more detail since you posted comment 3. I'll briefly describe some of our thoughts on this matter before moving on to discussing how we will respond to your concern.

Firstly, we have formed the view that the BRs are unclear about the requirements for doppelgänger root certificates. BR 1.6.1 includes the following definitions:

"Root CA: The top level Certification Authority whose Root Certificate is distributed by Application Software Suppliers and that issues Subordinate CA Certificates."

"Root Certificate: The self‐signed Certificate issued by the Root CA to identify itself and to facilitate verification of Certificates issued to its Subordinate CAs."

Taken together, these Defined Terms seem to imply that any self-signed certificate that is not "distributed in widely-available application software" is not a "Root Certificate" in BR terms; but if a doppelgänger root certificate is not a "Root Certificate", then what is it, and what rules apply?

The first paragraph of BR 1.1 uses similar language to describe the scope of the BRs as "...the issuance and management of Publicly‐Trusted Certificates; Certificates that are trusted by virtue of the fact that their corresponding Root Certificate is distributed in widely‐available application software." Note that "their corresponding Root Certificate" is expressed in singular form, just as it is in the Defined Terms mentioned above. This suggests that the BRs don't even envisage the existence of doppelgänger root certificates.

Secondly, having been unable to obtain clear guidance from the BRs, we've looked at some examples of how Mozilla has dealt with, and is dealing with, root certificate replacement requests resulting in doppelgänger root certificates:

It was our conclusion that our proposal to replace our roots was following a safe and well-trodden path.

As a practical matter, the plan outlined by Sectigo (and GTS) effectively prohibits clients from actually enforcing this constraint. As we saw with previous issues with OCSP responder certificates, clients enforcing constraints (such as Firefox does, with "must not have CA bit set), this can be critical to mitigating risks from CAs.

This practical matter is of course legitimate for client developers and root program owners to consider. Out of interest, are you aware of any clients that have wanted to, or even attempted to, enforce this particular constraint (i.e., digitalSignature MUST be set for direct OCSP response signing)?

Just an observation: unlike the concern that necessitated this bug, this practical matter seems to apply to all root certificates that due to their age don't completely adhere to the BR's Root CA Certificate profile, not just those root certificates that are perceived to be violating a requirement that pertains to ongoing use of the Root CA Private Key.

Hopefully that clarifies why I don't believe this resolves the compliance issue, introduces unnecessary risks to clients, overlooks relevant factors that seem to materially affect the projected impact, and carries unnecessary risk should Sectigo take steps to resolve those (e.g. by revoking the roots, which in practice is not possible short of blocklisting those non-compliant certs)

It seems to us that the best path forward right now is to take this discussion to m.d.s.p., to seek official input from some root program owners and to ensure that GTS are aware of your concern and have an opportunity to reconsider their plan before the December 2021 Batch of Root Changes drops.

The cost savings arguments are, equally, a little questionable, because this applies to responses generated by the roots - e.g. for intermediates. The BRs allow very long lifetimes for such responses, indicating high cachability by the clients. Similarly, a number of clients, most notably CryptoAPI, prefer CRLs vs OCSP for non-end-entity clients, not to mention a number of clients that use alternative means of revocation checking for intermediates (Valid, CRLLite, OneCRL, CRLSets)

Thanks for these notes. We hadn't considered the long lifetimes and high cacheability.

We'll hold fire on our root replacement plan for now, pending the outcome of the m.d.s.p discussion that I've just initiated. We are open to reconsidering our plan.

(In reply to Rob Stradling from comment #4)

Firstly, we have formed the view that the BRs are unclear about the requirements for doppelgänger root certificates. BR 1.6.1 includes the following definitions:

"Root CA: The top level Certification Authority whose Root Certificate is distributed by Application Software Suppliers and that issues Subordinate CA Certificates."

"Root Certificate: The self‐signed Certificate issued by the Root CA to identify itself and to facilitate verification of Certificates issued to its Subordinate CAs."

Taken together, these Defined Terms seem to imply that any self-signed certificate that is not "distributed in widely-available application software" is not a "Root Certificate" in BR terms; but if a doppelgänger root certificate is not a "Root Certificate", then what is it, and what rules apply?

I don’t think this holds, if I understand your argument correctly. That is, it seems to incorrectly conflate the notion that this is a mutually exclusive term; that is, that “the self-signed Certificate” means there can be no other self-signed certificates, rather than simply being an identifier in a set. That is, it would seem to be read as “the (only possible) self-signed certificate” rather than being “the subject certificate (in an unbounded set of certificates)”.

I can appreciate taking this perspective in isolation, but it similarly doesn’t hold up when considering the overall BRs. Logically taken to its natural conclusion, this would imply that if a CA issues two self-signed certificates with the same name and key, then it can assert that one is exempted from the BRs. Call these two certs “CA” and “Mirror CA”, it would allow saying all good issuances belong to CA, and all misissuances belong to “Mirror CA”, which isn’t subject to the BRs, ergo, not misissuances.

The clearer argument against this is simply looking at the Mozilla expectation of “directly or transitively issued”, and subject to disclosure and the BRs. Regardless of your view of BR’s perspective on doppelgängers, which hopefully the previous remarks show are unfounded, the expectation that all certificates are in scope should resolve any ambiguity. This is the same reason that we see disclosures regarding doppelgängers, of roots and of subordinate CAs. Attempting to carve out scope is something that has been repeatedly addressed for years, which is where and why I fail to see how this plan takes that into consideration.

Have I misunderstood your definitional argument? If so, is that misunderstanding relevant, given the scope and repeated clarity of that scope re: doppelgängers? If not, it’s worth more fully exploring the options and trade-offs. Part of my concern on this plan is that it treats root stores as needing to update according to CAs’ desires, which we last saw during the SHA-1 transition. Multiple CAs unfortunately took the problematic view that if they requested removal from a root store, then they were immediately free to misissue SHA-1 certificates. That significantly increased the risks for the transition, and undermined the whole objective of the transition in the first place, in terms of CA and ecosystem benefits.

The plan, as proposed, seems to take the view either that:

  1. Issuing OCSP responses immediately after replacement by a single root program (or those that more consistently enforce the BRs) is acceptable, or
  2. Misissuance and misuse of the private key is acceptable for some illdefined interim period unless and until some actor defines it as not misissuance (whether by replacing the root or redefining the BRs)

The former obviously is incredibly risky for any root program to be comfortable with. Should Mozilla accept this interpretation, if a CA uses a change reflected by Microsoft to justify it? And does the latter place all blame on the root program for misissuance, such that “if you don’t like it, you have to act to remove us?”

This is part of where and why I see the risk, both in this individual action (it doesn’t resolve the compliance issue) and in the overall ecosystem impact. Objectively, this plan isn’t compliant with the policies, and subjectively, with the goals for user security. It may turn out to be “the best bad idea in a barrel of bad ideas”, but if that’s the case, more details, discussion, and write-up are needed.

Ryan, thanks for your assessment of our thought process in comment 5 and for your input on the m.d.s.p discussion.

We agree that ”long-term ecosystem health” should be a primary concern, and we feel that establishing consensus (regarding the question of whether or not root replacement, without revoking or blocklisting the old root, remediates the “missing” digitalSignature bit concern) is an important step in that direction. We took this matter to m.d.s.p because although your viewpoint seemed to go against the established consensus (i.e., nobody objected to “The reason” for GTS’s root replacement) we were keen for it to be properly discussed and carefully considered by the current root program owners and other members of the community. (Bugzilla is great for dealing with incidents and their remediation where consensus on interpreting relevant policy has already been established, but wherever this is not the case we think it’s important to raise the issue with the wider audience that m.d.s.p no doubt reaches).

A little earlier today Ryan Dickson contributed Chrome’s opinion to the m.d.s.p discussion regarding the matter for which we are seeking consensus, and we remain extremely keen and hopeful that the other current root program owners and some other members of the community will chime in with their views as well. As we noted in comment 4, we await the conclusion of that discussion before we will either resume or reconsider our root replacement plan.

Ben, to encourage participation in the m.d.s.p discussion, would you consider announcing a cut-off date for comments and also setting Next-Update on this bug to a few days after that date?

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2021-12-17

(In reply to Rob Stradling from comment #7)

We agree that ”long-term ecosystem health” should be a primary concern, and we feel that establishing consensus (regarding the question of whether or not root replacement, without revoking or blocklisting the old root, remediates the “missing” digitalSignature bit concern) is an important step in that direction.

Could you help me understand this?

That is, are you saying that you still believe that, despite Comment #5, the plan you've outlined does comply with the BRs and Mozilla Root Store Policy? Or rather, are you saying that you still believe it's the most appropriate course of action?

Certainly, the conversation about whether it's an appropriate step is worthwhile, but the way you've phrased the comment here suggests that you may still believe your plan would actually remediate non-compliance.

I'm more than happy to again reiterate why, on a technical level, Sectigo's plan cannot be seen as remediating the issue, and fails to comply with both the letter and the spirit of policy. However, I would have hoped that, rather than that, Sectigo might provide a more thoughtful explanation, in the spirit of that offered in Comment #5, that examines what technical considerations and solutions were chosen. It's quite clear that there are several options - some of which unambiguously remediate the issue - and so if Sectigo is going to argue a path that, at best, is seen as ambiguous, it seems like at least some effort is warranted to further explain, and, as noted, consider providing actual data to support that conclusion.

Flags: needinfo?(rob)

And to be clear: A remediation plan is not necessarily instantaneous (e.g. the OCSP responder issue), but at least outlines a set of steps to resolve any ambiguity or dispute.

(In reply to Ryan Sleevi from comment #8)

(In reply to Rob Stradling from comment #7)
...are you saying that you still believe that, despite Comment #5, the plan you've outlined does comply with the BRs and Mozilla Root Store Policy? Or rather, are you saying that you still believe it's the most appropriate course of action?

We did not intend comment 7 to make any statement about the plan we ultimately will follow. As stated in that comment, we would like to understand the major root program owners' opinions on this matter, how it affects us and GTS, and the appropriate path forward for a CA in this position.

We are troubled that two CAs can propose the exact same mitigation plan just a few months apart and receive opposite reactions. We are confused as to how the community welcomed the exact plan stated in comment 0 when it was the GTS mitigation plan, and yet with our proposal it came under fire. We believe it is relevant to understand what, if anything, is different between the situations for GTS and ourselves. In the event there is no difference, it is relevant to understand how the earlier plan went unchallenged while the one in this bug met with strong, detailed objections. If nothing else, perhaps we can shed light on how it is that the earlier plan slipped through the cracks if, as you phrased it in comment 5, it "[doesn't resolve] the compliance issue, introduces unnecessary risks to clients, overlooks relevant factors that seem to materially affect the projected impact, and carries unnecessary risk".

If the ultimate response from the root program owners turns out to be, "Gee, we didn't notice that at the time. Sometimes second looks are helpful. Mea culpa", then that's fine. But we haven't heard that, or any coherent response as to why the same plan when offered by separate CAs would meet with such different results.

So we were asking the relevant participants what they had to say. We heard from you. We heard a partial answer from Ryan Dickson representing the Chrome Root Program. Chrome's is, however, not the only root program we deem important, and we were hoping that others would choose to participate in this discussion as well. Comment 7 was an attempt to spur such conversation. Ben indicated that he would close the m.d.s.p discussion yesterday, but unfortunately not a single root program owner has responded to your assertion in comment 3 that you "don't believe the proposed approach actually remediates the compliance issue". This means that, at the moment, the facts appear to be that GTS is allowed to remediate the compliance issue by replacing their roots, but Sectigo is not. This really is an extraordinary situation!

...if Sectigo is going to argue a path that, at best, is seen as ambiguous, it seems like at least some effort is warranted to further explain, and, as noted, consider providing actual data to support that conclusion.

See above for the intent of comment 7. When we formulated our plan it appeared clear that we were following the established mitigation strategy for this precise issue. We are open minded to changing that plan. But we deeply dislike the fact that the exact same proposal can get different responses when offered by different people. We feel that kind of ambiguity is unhealthy for the WebPKI and that this is a potentially useful moment for understanding how to advance clarity and consistency in this industry.

Flags: needinfo?(rob)

Thanks Rob for Comment #10.

I think it's worth highlighting that there's still not an answer for how it demonstrates compliance here, other than a presumption that "GTS did it, ergo it must be compliant". I'm not sure that helps address the technical concerns raised, and while I appreciate the meta-discussion you're raising, it also seems worthwhile to engage directly on the substance, which would help advance clarity and consistency in the industry.

With respect to the GTS issue, you can see that there were more substantial concerns raised with the root replacement process, raised in Bug 1709223, which dominated the discussion, and equally highlighted a non-compliant practice. Just as Bug 1675821 didn't resolve or remediate Bug 1709223, I don't think it's necessarily fair to expect a clean resolution.

I can appreciate the desire to "do what other CAs have done", and agree that, in isolation, that may highlight good practices. But I think it's still germane to the discussion to argue why and how you do believe it remediates, given the concerns raised, and as noted in Comment #8, the factors considered.

I'm trying to understand what your desired outcome is here. I haven't really seen an argument that is advanced on policy or technical grounds as to how this can be seen as remediating the issue, given the doppelganger scenario discussed, that is consistent with past policy. The closest seems to be the comparison to GTS, but that doesn't speak to any of GTS' long-term plans for remediation. Is it possible that you can make the argument, ignoring GTS, which might help provide clarity here?

Flags: needinfo?(rob)

Thank you, Ryan. We acknowledge your comment 11 and we also note that Ryan Dickson has now confirmed that the Chrome Root Authority Program shares your view. We are discussing how to move forward on this matter and expect to be able to announce an updated remediation plan in early January.

Ben, please could we set Next-Update to 7th January?

Flags: needinfo?(rob) → needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2021-12-17 → [ca-compliance] Next update 2022-01-07

(In reply to Ryan Sleevi from comment #11)

I think it's worth highlighting that there's still not an answer for how it demonstrates compliance here, other than a presumption that "GTS did it, ergo it must be compliant". I'm not sure that helps address the technical concerns raised, and while I appreciate the meta-discussion you're raising, it also seems worthwhile to engage directly on the substance, which would help advance clarity and consistency in the industry.

Ryan, we would like to explain why we were reluctant to engage with this request whilst the m.d.s.p discussion was ongoing. It's worthwhile to remember that the BRs and EVGs carry enforcement weight only insofar as compliance to these guidelines is mandated by the various root program policies; and for better or for worse, whenever there is disagreement over how to interpret any aspect of root program policy, ultimately the opinions of the root programs are the ones that have teeth. Therefore, when the public record shows that multiple root programs have adopted a particular interpretation of a requirement, it would seem like foolishness for a CA to seek to interpret that same requirement without giving greater credence to the root programs' established interpretation than to the CA's own opinions. The CABForum guidelines have long been and continue to be works in progress, meaning that all CAs must look to the community's interpretation to understand some specifics of their expected behaviour. The dialogue on this very thread is an example of these guidelines' loose language and the occasional need to clarify expectations, especially when dealing with their less commonly referenced sections. Comment 4 provides specific examples of how the community has arrived at interpretations and then allowed CAs following them to be considered "compliant".

This bug was a case where clarified expectations would have been helpful, especially as we saw the community taking a very different de facto position from the one you took here. We do thank Ryan Dickson for coming back with a clear position that was supportive of your interpretation whilst concessively tolerating that de facto position. However, since the Chrome Root Program represents only one of the root programs to which we are beholden and since no other root program offered any opinion on your interpretation in the m.d.s.p discussion, we don't believe this one statement is sufficient to continue down a path that we had originally believed to be consecrated by community opinion. We deem it an unacceptable risk to perhaps one day find ourselves facing the contention that the de facto position was not actually held or tolerated by all of the relevant root programs after all.

Therefore we intend to fully and unambiguously resolve this compliance issue by implementing delegated OCSP responders for our affected Root CAs listed in comment 0. We are working on the specifics of the release schedule but currently anticipate a February release. We had not yet issued the replacement root certificates we announced in this bug and will not now do so.

Ben: We don't expect key staff to be available to work on this project during January, so we are asking to set "Next update" to February 2nd.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2022-01-07 → [ca-compliance] Next update 2022-02-02

Rob,

Thanks for the continued engagement here. I’m not trying to beat you up, but I think it does bear pointing out that however reasonable your argument is framed here in Comment #13, it does seem devoid of technical substance that would help understand or prevent these issues going forward, and so I want to continue to push back on asking for a more meaningful reply.

For example, you state:

Therefore, when the public record shows that multiple root programs have adopted a particular interpretation of a requirement,

But you don’t actually provide any details about what you believe that public record to be. It’s unclear, for example, if this is just because the same issue wasn’t pointed out at the same time to GTS, that you believed it acceptable, or if there’s some other element here that you see at play.

I think an approach to compliance that looks at the absence of criticism to claim approval or adoption is bound to be problematic, because it places all of the burden of actually following the BRs onto the Root Programs to detect and call out noncompliance. I would hope it’s clear how dangerous such a conclusion would be, and that’s why I continue to push to understand what facts of your own evaluation process were at play here.

We deem it an unacceptable risk to perhaps one day find ourselves facing the contention that the de facto position was not actually held or tolerated by all of the relevant root programs after all.

I agree, that is a risk, and that’s why it is important to understand the process for evaluating things. In particular, in the repeat considerations of doppelgängers, trying to understand how the conclusion could be overlooked. I am suggesting, for example, that Sectigo’s approach to compliance factor in all the relevant information - such as that in X.509 and RFC 5280, and certainly guidance such as RFC 4158, when factoring these in.

I think my challenge here is in understanding how this situation is, say, meaningfully different than underscores, which were an example of something that were never permissible in the standards, some CAs ignored those standards, and then claimed because browsers did not take action, such as distrust, that it was acceptable to continue to do so.

It may be that the answer is simply Sectigo didn’t do an independent evaluation of acceptability, and simply relied on the assumption that if GTS did it, it must comply. That at least seems to be the case here, and if so, it seems to overlook any other factors that could be relevant to consideration with GTS (e.g. if they planned to replace now, and use Delegated/Authorized responders as the full long term remediation), and it also overlooks Sectigo’s own responsibility to evaluate.

That’s why I’m still hanging on here. While I’m happy that it sounds like there is a remediation path for the near term, it seems like this is an opportunity for Sectigo to revisit and re-evaluate how it assesses compliance, to help prevent future misunderstandings or issues.

Ryan,

We acknowledge your question from comment 14. We have not yet had a chance to compose a reply. That response is forthcoming.

(In reply to Ryan Sleevi from comment #14)

It’s unclear, for example, if this is just because the same issue wasn’t pointed out at the same time to GTS, that you believed it acceptable, or if there’s some other element here that you see at play.

Beyond the simple lack of objection to GTS's plan, we observed active assent from the browser community, which was very important to our interpretation of de facto requirements. In comment 4 I linked to the "associated m.d.s.p thread" for the "Public Discussion of Google Trust Services' Request to Replace Root CA Certificates", in which Ben Wilson wrote,

The reason for their replacement is that the original CA certificates do not contain the digitalSignature key usage bit, which is required for direct OCSP signing by the CA. (See https://bugzilla.mozilla.org/show_bug.cgi?id=1652581)

We read that as a statement of acceptance by Mozilla that GTS's plan was an effective mitigation of the compliance issue. Since m.d.s.p is the de facto venue for discussing CA Compliance issues, and since representatives of multiple root programs often participate, it seemed to us that this statement in combination with the lack of objection or expressed concern from the rest of the community (including the Chrome Root Program) indicated that this matter had been considered and resolved.

We hear your point about the difficulty in relying on the Root Programs to detect and call out examples of noncompliance. However, it's important to remember that the Root Programs are the final arbiters for how the BRs will be applied; they can choose to disregard any or all portions of the BRs, and their interpretations override the interpretations of individual CAs and the wider community. In this case we saw a proactive statement and actions by a Root Program directly supporting the GTS plan.

I would hope it’s clear how dangerous such a conclusion would be, and that’s why I continue to push to understand what facts of your own evaluation process were at play here.
...
It may be that the answer is simply Sectigo didn’t do an independent evaluation of acceptability, and simply relied on the assumption that if GTS did it, it must comply. That at least seems to be the case here, and if so, it seems to overlook any other factors that could be relevant to consideration with GTS (e.g. if they planned to replace now, and use Delegated/Authorized responders as the full long term remediation), and it also overlooks Sectigo’s own responsibility to evaluate.

Our "own evaluation process" determined that our pre-BR roots were exempt from this requirement and that therefore the "missing" digitalSignature bit didn't actually prohibit us from directly signing OCSP responses.

That analysis was rejected by the community.  After that outcome, we didn't consider it wise to rely solely on our "own evaluation process" to (re)interpret the same portion of the BRs.  Rather, we wanted to avoid future missteps.   Comment 4 details the problematic language in the BRs that directly affected our (re)evaluation of this issue and our response to that problem, which was to look at the discrete actions of Root Programs in related matters. Unfortunately, what looked like clear precedent was not actually a predictor of how this issue would resolve itself.

That’s why I’m still hanging on here. While I’m happy that it sounds like there is a remediation path for the near term, it seems like this is an opportunity for Sectigo to revisit and re-evaluate how it assesses compliance, to help prevent future misunderstandings or issues.

It's hard to assess compliance to policy language that, lacking precision, is interpreted differently by different parties at different times. It was precisely "to help prevent future misunderstandings or issues" in our “own evaluation process” that we asked to add "Understanding Requirements for Legacy CA Certificates” as a topic in the last CAB Forum F2F.   We were hoping for a productive discussion regarding the principles surrounding when exemptions would be granted to roots for pre-existing status, which would have helped us to "re-evaluate how [Sectigo] assesses compliance". Although we didn't see a great deal of engagement with that topic in the meeting, the few people that commented all seemed to agree that any use of a CA private key is subject to the then-current requirements. Our "own evaluation process" has taken this viewpoint on board.

We subsequently hoped that the m.d.s.p thread regarding root replacement and doppelgängers would drive some dialog in the community and ultimately clarify expectations about the policy language that we felt lacked precision. Unfortunately, in that case it took repeated requests to receive a complete response from one Root Program and we got nothing of substance from the others.

So at this point we're closely focused on implementing something that will allow us to move forward. It's been an extremely frustrating experience for us as this is now the sixteenth comment on an issue for which we started with a plan we'd deliberately built to match what the community's words and actions had shown it wanted. Our pragmatic new-new remediation plan notwithstanding, we're not convinced the wider discussion has run its course. We believe that the community would do well to dive deeper into issues surrounding pre-existing roots and exemptions and to attempt to clarify the corresponding policy language.

We are in the implementation process for adding delegated responder support to our offline signing. We have bundled this update into a larger package of changes to improve OCSP pre-generation performance.

We don’t presently have a firm release target for this project, although we know deployment will not occur prior to early March.

Ben, can we have a Next Update for March 8? At that date we expect to be able to announce a target date.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2022-02-02 → [ca-compliance] Next update 2022-03-08

We are aiming for deployment mid-April. Ben can we have a Next Update on April 18?

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2022-03-08 → [ca-compliance] Next update 2022-04-18

On March 28 we issued our new certificates for signing OCSP responses. We are working on our schedule for deployment and will announce it when we have a firm target date.

As mentioned recently in bug 1763203 comment 1, we are targeting May deployment of our new OCSP code. We will maintain weekly updates until that occurs.

We continue to target a May release.

We are still targeting a May release.

During testing of our new OCSP responder code, we have discovered an issue that will require additional work to address, which puts our May release in jeopardy. We will require time for investigation before we can confirm or set a new target date. We will inform the community when we know more.

This last weekend we deployed the code to our CA system necessary to integrate it with our new and improved OCSP system. This new CA code is presently switched off while our Operations team continues deployment of the needed infrastructure to support the new OCSP responders and response generators that will supersede the equivalent components in our previous OCSP system.

The exact time we will need to deploy the new components, test load, and make adjustments is unclear, but we do not expect to be complete this month. Ben, can we have a Next Update on June 14?

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2022-04-18 → [ca-compliance] Next update 2022-06-14

We are in the process of infrastructure deployment, testing, and adjustment described in comment 24. This process will take place from now through July, with various elements of our infrastructure deployed and tested during that timeframe. Our target for full production is sometime in August.

Ben, can we have a Next Update on July 27 to give the community an update on how that process has gone?

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2022-06-14 → [ca-compliance] Next update 2022-07-27

On July 6 to 7, 2022 we experienced a significant slowdown in OCSP responses as a direct consequence of a previously undiscovered bug, which manifested itself upon deployment of new code in preparation for rollout of our Delegated OCSP Response service. As a consequence of this slowdown and our actions to fix it, we experienced OCSP response validation errors during this time. These errors were fully resolved in less than one day on the morning of July 7, GMT.

These errors directly resulted from our phased rollout of Delegated OCSP. We are working on a much more detailed post explaining what occurred, why, and what we did to address it.

We are working on a full writeup for the episode discussed in comment 26.

Description of incident

We experienced a severe backlog in the pregeneration and publishing of OCSP responses between 2022-07-06T13:15:00Z and 2022-07-07T10:38:00Z, resulting in many failures to respond correctly to OCSP requests during that timeframe. This outage was an unanticipated side effect of the ongoing phase-in of our new OCSP service.

On July 6 we began using a new batch of offline-signed OCSP responses for the certificates issued by the roots mentioned in comment 0; for the first time, these OCSP responses were signed by delegated OCSP signers. A consequence of this change is that many of these OCSP responses exceeded 2000 bytes in size, which is something that has not occurred previously within our CA infrastructure.

At this time, we had deployed enough of the back end of our new OCSP service to be able to preproduce the to-be-signed OCSP responses that would then be signed by the delegated OCSP signers. However, we were still running our “old” front-end OCSP responders. We had believed that publishing the new batch of OCSP responses would be perfectly safe as we expected a 32K size threshold and our “old” OCSP responders had already been used in the past to produce and publish online-signed delegated OCSP responses. However, a pair of previously unknown bugs in how our database backend processes our workstreams caused them to fail for responses of more than 2000 bytes in size. These bugs were environment-specific and did not manifest during testing. The first of these has to do with our batch-processing engine for writing OCSP responses to the database. It turns out that our database’s bulk-writing operation failed for BLOBs (Binary Large Objects) of more than 2000 bytes. The second problem came from a specific API call we were using to read OCSP Response BLOBs, which likewise failed on BLOBs larger than 2000 bytes. In both these instances we had understood based on available documentation that the threshold was 32K, and as our OCSP responses previous to this deployment were universally smaller than 2000 bytes, we had not yet discovered this misunderstanding.

At 2022-07-06T16:15:00Z we deployed an urgent hotfix that removed the bulk writing function in favor of individual writing events for OCSP responses above 2000 bytes. This resolved the first of the two issues explained in the prior paragraph as the 2000-byte limit no longer applied. A little under four hours later we deployed a second urgent hotfix to divert the reading of responses above 2000 bytes to a different database API call that did not experience this 2000-byte limit.

Between 2022-07-06T13:15:00Z and 2022-07-06T22:00:00Z, our responders were unable to download the stream of newly generated/published OCSP Responses from our (soon to be) legacy OCSP response generation system, which caused a significant "download backlog." As soon as this problem was fixed, the backlog began going down. The backlog fully cleared by approximately 2022-07-07T05:20:00Z.

As this download backload was clearing, it became apparent to us that the delegated OCSP signer certificates were not always being embedded in the OCSP responses. (This embedding is expected to occur when an OCSP response is published, not when it is signed.) Investigation revealed that our (soon to be) legacy OCSP response publication system was being tripped up by the fact that these OCSP signer certificates are associated with the self-signed Root Certificate records in our database, whereas the Subordinate CA Certificates (and hence their corresponding OCSP response records) are mostly associated with cross-certificates issued to those Roots. At approximately 2022-07-06T22:50:00Z we deployed another urgent hotfix that caused our systems to consider OCSP signer certificates associated with a self-signed Root Certificate to also be applicable to cross-certificates for the same Root.

We continued to monitor performance of this system and observed that, even after the backload had fully cleared, we still were experiencing OCSP response validation errors. Investigation revealed a bug in the first hotfix we had deployed the previous day. An error in the new non-bulk writing code path meant that >2000-byte OCSP responses were being added to the published OCSP responses table in the database, but they were not being correctly queued so that our "old" OCSP responders would then download them. At approximately 2022-07-07T10:15:00Z we deployed an urgent hotfix to address this issue. To ensure that the effect of this fix would be seen externally ASAP, we then executed an UPDATE statement on our database to cause the (re-)publication of the current OCSP response for each offline-signed certificate. It took a few minutes for all our "old" OCSP responders to download those OCSP responses. Following that we purged our Cloudflare cache so that relying parties would benefit straight away. This purge occurred at 2022-07-07T10:38:00Z.

In comment 0 we explained how we had been left with no choice but to switch to using delegated OCSP for our legacy Roots, due to requirements that did not yet exist at the time the Root Certificates were issued. Due to the timeline for full deployment of our new OCSP service being pushed back several times, we felt that we should accelerate the switch to delegated OCSP for our legacy Roots as soon as enough of the new infrastructure was deployed to make this possible.

Once our new OCSP responder system is fully operational and proven reliable, we will be able to sunset the previous systems that were the source of these problems. The new system relies on a different database management system that does not impose size limitations on BLOBs, and we have verified that the new system correctly considers OCSP signer certificates associated with a self-signed Root Certificate to also be applicable to cross-certificates for the same Root, so there should be no potential for any of these issues to recur.

The factors that brought about this outage would have been difficult to foresee. Our understanding of size limitations in our backend database in certain contexts turned out to be incorrect, which we only discovered when it actually became a problem. As we were working to deploy rapid hotfixes, a bug slipped through in one of them. These errors occurred in old code that we were phasing out for new code that we expect to be more robust, and it was the actual act of phasing over that brought about the outage.

Timeline

2022-07-06T13:15:00Z – OCSP backlog begins
2022-07-06T16:15:00Z – First urgent hotfix deployed
2022-07-06T22:00:00Z – Second urgent hotfix deployed
2022-07-07T05:20:00Z – Backlog fully cleared
2022-07-07T10:15:00Z – Third urgent hotfix deployed
2022-07-07T10:38:00Z – Cloudflare cache purged; issue resolution complete

Thanks, Tim, for the update.
Can we get an estimated timeline for completing the work needed to complete the transition (and close this issue)?
Meanwhile, I have set the next update to Sept. 1.
Ben

Flags: needinfo?(tim.callan)
Whiteboard: [ca-compliance] Next update 2022-07-27 → [ca-compliance] Next update 2022-09-01

On August 25, 2022 we completed deployment of and migration to our new OCSP infrastructure.

Since then, we have not encountered any more issues. This deployment and migration complete our remediation of this incident.

Flags: needinfo?(tim.callan)

It appears no further questions have been raised since our deployment. Ben, can we close this bug?

Flags: needinfo?(bwilson)

I intend to close this bug on or about Friday 9-9-2022.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] Next update 2022-09-01 → [ca-compliance] [ocsp-failure]
You need to log in before you can comment on or make changes to this bug.