Closed Bug 1793443 Opened 2 years ago Closed 2 years ago

Microsoft PKI Services: "unknown" OCSP response for issued certificates

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: agwa-bugs, Assigned: johnmas, NeedInfo)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

Attachments

(2 files)

17.92 KB, application/zip
Details
5.08 MB, application/x-zip-compressed
Details
Assignee: bwilson → johnmas
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

Acknowledging. We will provide a preliminary report later today.

Below is a preliminary incident report that we expect to add more detail within 7 days as we investigate.

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date.

  • The Microsoft PKI Services (MS PKI) team became aware of this problem when this bug was assigned on 2022-10-03 08:06 PDT. The initial investigation determined that there was an issue that prevented publishing to OCSP that impacted all the reported certificates. Further investigation is ongoing and a full report is expected within the next 7 days.

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

  • 2022-08-08 16:20 PDT: OCSP publishing issues started for the impacted CA server and region
  • 2022-08-08 16:55 PDT: First reported certificate was created
  • 2022-08-08 18:45 PDT: Final reported certificate was created
  • 2022-08-08 19:44 PDT: OCSP publishing issues ended
  • 2022-10-03 08:06 PDT: This bug was assigned to Microsoft
  • 2022-10-03 09:57 PDT: Reported certificates were manually published to OCSP, mitigating the specific certificates from the initial report

Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

  • Microsoft PKI Services did not issue any of the reported certificates to Subscribers. Our automation prevents issuance to Subscribers for any failure, including OCSP publishing failures.
  • We have already implemented new monitoring and a human process to mitigate future issues where certificates are not published to OCSP. We are investigating automation solutions to monitor for this scenario. We expect to have a plan for this monitoring and automated remediation completed by next week.

In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

  • The first reported certificate was created 2022-08-08 16:55 PDT and the final reported certificate was created 2022-08-08 18:45 PDT. Please see below for the links to the certificates that were not published to OCSP.

In a case involving TLS server certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. It is also recommended that you use this form in your list "https://crt.sh/?sha256=[sha256-hash]", unless circumstances dictate otherwise. When the incident being reported involves an SMIME certificate, if disclosure of personally identifiable information in the certificate may be contrary to applicable law, please provide at least the certificate serial number and SHA256 hash of the certificate. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

  • We are still investigating these details and expect to have a complete answer within 7 days. Part of this investigation involves comparing every currently valid certificate generated by our CA software against OCSP to determine if there are any additional impacted certificates.

List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

  • We are still investigating these details and expect to have a complete answer within 7 days.

While our initial investigation pointed to an isolated issue with OCSP publishing during a brief period, Microsoft PKI Services performed an exhaustive review of all CAs to compare certificates generated by the CA to the certificates in OCSP. We determined the root cause was that the provisioning workflow would enter a failure between generating a certificate and successfully publishing to OCSP. The workflow would end after exhausting a retry count and no additional action was taken for the generated certificate. There were no specific alerts or automated processes in place to enforce OCSP publishing in this case.

It is worth noting that in mid-August 2022, we moved the OCSP publishing step much earlier in the provisioning workflow to publish pre-certificates to OCSP. This resulted in drastically less opportunities for workflow failures to occur before OCSP publishing. However, it still left two potential failure points that can result in not being published to OCSP. As an immediate measure, we mitigated this problem by adding alerting, as described in our previous response, for these failures and updated the processes to manually publish to OCSP. We are investigating more robust automated solutions to remove the human element from this mitigation process.

Microsoft PKI Services identified 2221 total certificates that did not meet the requirements in Section 5.4 of M.R.S.P. Version 2.8. These are all published to OCSP now and responding with a “good” response. We identified 2208 Non-expired Final Certificates that were not in OCSP due to this issue. None of these final certificates were provided to Subscribers as we only provide them if all workflow steps are successful. We also identified 13 Non-expired Pre-Certificates that were generated after 2022-09-30 that were not published to OCSP.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

  • This is an implementation gap in handling an edge case scenario where any failures in publishing certificates to the OCSP provider after the certificate is generated are retried a fixed number of times before marking the request as failed. The design did not have a provision to publish these certificates to the OCSP provider after the failure.

List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

  • As stated above, Microsoft PKI Services recently changed the workflow process that drastically reduced the potential failures publishing to OCSP. For the rare cases where OCSP publishing could still fail, we have implemented alerts and manual mitigation steps. This mitigates the problems that caused this incident and will prevent future incidents.

Hi Andrew,

These three certificates are all pre-certificates generated before October 1, 2022 that do not have a final certificate. We did identify them in our review, but since there was no requirement to publish these to OCSP at the time we intentionally filtered them out of our impacted certificates count.

Regards,
John Mason

Flags: needinfo?(johnmas)

Could you explain why you think that those certificates - presumed to be issued before October 1, 2022, based on existing pre-certificates - do not need the OCSP services expected to exist for BR-compliant certificates?

The only special reference to October 1, 2022 that I can find is in MRSP s5.5:

[...]
Effective October 1, 2022,

  • a CA MUST be able to revoke a certificate presumed to exist, if revocation of the certificate is required under this policy, even if the final certificate does not actually exist; and
  • a CA MUST provide CRL and OCSP services and responses in accordance with this policy for all certificates presumed to exist based on the presence of a precertificate, even if the certificate does not actually exist.

(emphasis mine)

That does not preclude pre-certificates older than that date from the policy, only that the explicit requirement for the CRL and OCSP services would go into effect on 2022-10-01. There is no text in that section that grandfathers certificates which we presume to be issued before the policy went into effect, so I'm not sure why you assumed that you don't need to provide the required services for those certificates.


Could you provide the full certificate data for question 5? It seems like only those already public in this issue were linked, while you mentioned that there were 2221 certificates affected in Comment 3.

Flags: needinfo?(johnmas)

Microsoft PKI Services had been involved in the draft language of this requirement (Section 5.4 of M.R.S.P. Version 2.8). Our understanding the entire time has been from the perspective of pre-certificate or final certificate issuance. With that in mind, we implemented changes to our issuance workflow with the expectation that OCSP publishing occurs for all newly issued certificates, starting October 1, 2022.

We acknowledge that there can be multiple interpretations of this requirement. We would ask for clarity from Mozilla in this case if the requirement includes all pre-certificates issued before October 1, 2022.

Flags: needinfo?(bwilson)
Attached file MSPKI.zip

I attached the certificates referenced in Comment 3.

(In reply to Dustin Hollenback from comment #7)

Microsoft PKI Services had been involved in the draft language of this requirement (Section 5.4 of M.R.S.P. Version 2.8). Our understanding the entire time has been from the perspective of pre-certificate or final certificate issuance. With that in mind, we implemented changes to our issuance workflow with the expectation that OCSP publishing occurs for all newly issued certificates, starting October 1, 2022.

That is a reasonable interpretation, because our effective dates for policy changes typically apply to certificates issued after the effective date.

We acknowledge that there can be multiple interpretations of this requirement. We would ask for clarity from Mozilla in this case if the requirement includes all pre-certificates issued before October 1, 2022.

The requirement does not include pre-certificates issued before October 1, 2022.
All pre-certificates issued on October 1, 2022, or later must satisfy the requirement.

Thank you for the clarification, Kathleen.

That was the last concern that I was aware of related to this bug. If there are no remaining concerns, we ask that this bug be resolved at this time.

Out of curiosity (this is not a question that should in any way delay the resolution of this ticket) -- you stated that Microsoft has already moved the generation of OCSP much earlier in the issuance workflow, to create fewer opportunities for workflow failure before the first OCSP response is issued.

Has Microsoft considered moving the creation of the first OCSP response to be before the creation of even the precertificate? All of the necessary inputs -- the serial number and the issuing CA -- can be available before the precertificate is issued.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED

(In reply to Aaron Gable from comment #11)

Has Microsoft considered moving the creation of the first OCSP response to be before the creation of even the precertificate? All of the necessary inputs -- the serial number and the issuing CA -- can be available before the precertificate is issued.

Hi Aaron.

"Before the precertificate is issued" means that even if a serial number is "available" it is still considered to be "unused" in the context of BR 4.9.10, which also says:

'If the OCSP responder receives a request for the status of a certificate serial number that is “unused”, then the responder SHOULD NOT respond with a “good” status.'

Since "good" is frowned upon, which CertStatus value would you propose to include in a first OCSP response that is created before the precertificate is issued?

Flags: needinfo?(aaron)

The section that Rob references is stronger than a "SHOULD NOT" for CAs which are not Technically Constrained: it's actually a "MUST NOT".

While 4.9.10 speaks to the required parameters for responding to OCSP requests, it is seemingly mum on whether it is prohibited to sign a definitive response for an "unused" serial number and merely not distribute it to OCSP clients/Relying Parties.

Hi Corey. Point taken about 4.9.10. What do you make of 4.9.9?

"OCSP responses MUST either:
1.Be signed by the CA that issued the Certificates whose revocation status is being checked, or
2.Be signed by an OCSP Responder whose Certificate is signed by the CA that issued the Certificate whose revocation status is being checked."

If the intended meaning is that "...that issued the Certificate[s]" (note the past tense) has to occur before it's possible for a compliant OCSP response to exist, then this would imply that the CA is not permitted to sign a definitive OCSP response for an "unused" serial number.

Alternatively, if "whose revocation status is being checked" and the word "checking" in the section title mean that 4.9.9 is only intended to apply to OCSP responses that are actually distributed to relying parties, then ISTM that there are no rules whatsoever for OCSP responses whilst they remain undistributed. Even the "MUST conform to RFC6960 and/or RFC5019" requirement would not apply.

Flags: needinfo?(corey.bonnell)

Hi Rob,

Alternatively, if "whose revocation status is being checked" and the word "checking" in the section title mean that 4.9.9 is only intended to apply to OCSP responses that are actually distributed to relying parties, then ISTM that there are no rules whatsoever for OCSP responses whilst they remain undistributed. Even the "MUST conform to RFC6960 and/or RFC5019" requirement would not apply.

Your reading that I quoted above most closely matches my interpretation of section 4.9.9. My understanding of the intent behind 4.9.9 is to prohibit CAs from providing OCSP responses which are usuable/not able to be verified unless RP software has access (via local policy, etc.) to other certificates and/or trust anchors besides those used for the TLS connection; it is not necessarily a restriction on what can be signed. Additionally, if section 4.9.9 restricts CAs on which OCSP responses they can sign, then that section would prohibit pre-production of OCSP responses until the CA actually receives an OCSP request. I suppose one could argue that the CA could internally issue an OCSP request to fulfill that obligation, but that seems contrived.

Thanks,
Corey

Flags: needinfo?(corey.bonnell)

(In reply to Rob Stradling from comment #12)

Since "good" is frowned upon, which CertStatus value would you propose to include in a first OCSP response that is created before the precertificate is issued?

As Corey pointed out, BR 4.9.10 says that it is unacceptable to respond with a "good" status, not that it is unacceptable to produce an OCSP response with the "good" status and store it for use when an OCSP request arrives.

The issue is that signing a precertificate is a binding intent to sign a final certificate, even if that precertificate is never publicly shared or logged in CT. Therefore it is risky to sign a precertificate, then sign an OCSP response, and finally make both available to the public: if the creation of the OCSP response fails, then there is a precertificate for which no OCSP response is available. (This can, of course, be mitigated with the ability to live-sign new OCSP responses as requests come it.)

Ever since https://bugzilla.mozilla.org/show_bug.cgi?id=1577652, Let's Encrypt's approach has been to first sign the OCSP response, then sign the precertificate, then persist both to the database in a single transaction (with additional automation to recover both from the audit logs if the transaction fails). This way, if signing the precertificate fails, the OCSP response is dropped and never served, which is in line with both 4.9.9 and 4.9.10.

Flags: needinfo?(aaron)

(In reply to Kathleen Wilson from comment #9)

(In reply to Dustin Hollenback from comment #7)

Microsoft PKI Services had been involved in the draft language of this requirement (Section 5.4 of M.R.S.P. Version 2.8). Our understanding the entire time has been from the perspective of pre-certificate or final certificate issuance. With that in mind, we implemented changes to our issuance workflow with the expectation that OCSP publishing occurs for all newly issued certificates, starting October 1, 2022.

That is a reasonable interpretation, because our effective dates for policy changes typically apply to certificates issued after the effective date.

We acknowledge that there can be multiple interpretations of this requirement. We would ask for clarity from Mozilla in this case if the requirement includes all pre-certificates issued before October 1, 2022.

The requirement does not include pre-certificates issued before October 1, 2022.
All pre-certificates issued on October 1, 2022, or later must satisfy the requirement.

Respectfully, this community has had a different interpretation of such cases and this was discussed when SC31 required a reasonCode (not unspecified) for the revocation of CA Certificates. Although the requirement became effective 2020-09-30, the expectation was that CAs had to add revocation reasons for all past revocations, because a CRL issued after 2020-09-30 had to contain a reason for all CA Certificate revocations. The discussion was conducted in m.d.s.p. and is available at https://groups.google.com/g/mozilla.dev.security.policy/c/7z6dqwdc16o/m/TVHevphhCwAJ.

The safest interpretation of such requirements is to retroactively check if the requirement applies and fix accordingly.

Flags: needinfo?(bwilson)

I've opened a conversation on m-d-s-p with the hope of resolving some of the underlying policy issues discussed in the comments made here thus far. See https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/x3sRo8tALr0/m/cjLHyFQOBAAJ .
Thanks,
Ben

Flags: needinfo?(bwilson)
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ocsp-failure]
Summary: Microsoft: "unknown" OCSP response for issued certificates → Microsoft PKI Services: "unknown" OCSP response for issued certificates
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: