Closed Bug 1879552 Opened 2 years ago Closed 1 year ago

Microsoft PKI Services: OCSP Responder does not know a Certificate

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: johnmas, Assigned: johnmas)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0

PRELIMINARY NOTICE

Incident Report DRAFT

On Thursday February 8 at approximately 8:58 AM Pacific Time MS PKI Services was notified by Ben Wilson at Mozilla that we had one certificate that was unknown to our OCSP. Ben provided a link to the OCSP Watch tool (https://sslmate.com/labs/ocsp_watch/) and the error message with regard to the certificate provided by the tool.

We began our investigation and determined that we indeed had at least one Final Certificate that was unknown to our OCSP server.
We have uploaded the certificate to our OCSP server and the certificate now has a proper OCSP response and the error is no longer showing up in the OCSP Watch tool.

We will respond back to this Bugzilla bug on or before February 15 with our Root Cause Analysis.

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date.

Notified by Mozilla as described above.

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Note: All times in Pacific Time (PT) (unless otherwise noted).

2023-12-13 21:54:34 GMT: Final Certificate in question was created (https://crt.sh/?sha256=5db268442e9e9b9919ee3ab0bc2e397e339972c8156ea901d002ac4508d7164a)
2024-02-08 08:58: MS PKI Services became aware that our OCSP did not know this certificate.
2024-02-08 ~18:00: MS PKI Services updated the OCSP server with the Final Certificate and repaired the Unknown Status issue for this certificate.
2024-02-08 ~18:30: MS PKI Services verified that OCSP Watch Tool was no longer reporting this certificate not known to our OCSP server.

Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

Microsoft PKI Services has not stopped certificate issuance at this time. While we consider this a serious process failure the OCSP Watch tool only has one certificate identified. Currently we do not believe this is a widespread failure case. If our investigation indicates otherwise, we will stop issuance immediately.

In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

We have only identified one certificate impacted during this incident at this time.

In a case involving TLS server certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. It is also recommended that you use this form in your list "https://crt.sh/?sha256=[sha256-hash]", unless circumstances dictate otherwise. When the incident being reported involves an SMIME certificate, if disclosure of personally identifiable information in the certificate may be contrary to applicable law, please provide at least the certificate serial number and SHA256 hash of the certificate. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

We have only identified one certificate impacted during this incident at this time.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We are working on a Root Cause Analysis and will report our findings on or before February 15, 2024.

List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

We are working on a Root Cause Analysis and will report our findings on or before February 15, 2024.

Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: INVALID → ---
Component: General → CA Certificate Compliance
Product: Invalid Bugs → CA Program
Assignee: nobody → johnmas
Type: defect → task
Whiteboard: [ca-compliance] [ocsp-failure]
# UPDATED Incident Report ## Microsoft PKI Services: OCSP Responder does not know a Certificate ### Incident Report (All reference times are in Pacific Time) On 2024-02-08 08:58 MS PKI Services was notified by Ben Wilson at Mozilla that we had one certificate that was unknown to our OCSP. Ben provided a link to the tool (https://sslmate.com/labs/ocsp_watch/) and the error message with regard to the certificate provided by the tool. We began our investigation and determined that we indeed had at least one Certificate that was unknown to our OCSP server. On 2024-02-08 ~16:00, we uploaded the certificate to our OCSP server and confirmed we received a proper OCSP response, and the error is no longer showing up in the OCSP Watch tool. We continued investigating the incident, and discovered we had additional certificates that were unknown to our OCSP Responder. Our team suspected that something was wrong with 1 of our 4 instances of our tools that exist in two different data centers (we will call the suspicious instance “Z4”). The instance of our tools that we suspected, Z4, is new and went live for the first time on 2023-11-28. Within an hour of indications that Z4 was the root of the issue, we suspended the issuance of all certificates from Z4 on 2024-02-12 06:51. We were able to finalize a list of all certificates that were impacted by this incident. We compared serial numbers of all active certificates (~22 million) against all serial numbers in OCSP. We identified 100 more certificates that were not known to our OCSP responder for a total of 101 certificates. All certificates impacted were issued from Z4 sometime after 2023-11-28. The Z4 instance issued ~1.7 million certificates from 2023-11-28 until 2024-02-12 ~6:51 when it was taken out of production. As of 2024-02-14 ~13:30 all impacted certificates had been uploaded to our OCSP Responder and were providing appropriate responses. > How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date. Notified by Mozilla as described above. > A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done. Note: All times in Pacific Time (PT) (unless otherwise noted). **2023-11-28:** Instance "Z4" of the MS PKI Tools went live in production and began issuing certificates. **2023-12-13 21:54:34 GMT:** Final Certificate for Initial Problem Report was created (https://crt.sh/?sha256=5db268442e9e9b9919ee3ab0bc2e397e339972c8156ea901d002ac4508d7164a) **2024-02-08 08:58:** MS PKI Services became aware that our OCSP did not know this certificate. **2024-02-08 ~18:00:** MS PKI Services updated the OCSP server with the Final Certificate of Original Certificate that started the Problem Report and repaired the Unknown Status issue for this certificate. **2024-02-08 ~18:30:** MS PKI Services verified that OCSP Watch Tool was no longer reporting this certificate not known to our OCSP server. **2024-02-12 06:51:** Suspended operation of Z4 instance of tools (1 of 4 instances that issues certificates) as we suspected root cause of this incident came from this instance of our tools. **2024-02-12 ~18:00:** Confirmed all problem certificates (101) out of ~22 million live certificates and that they all came from the suspicious instance of our tools, Z4, in the period since that instance went live. **2024-02-14 ~13:30:** Confirmed that all 101 certificates without OCSP responses were successfully uploaded to our OCSP responder and were now responding. >Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation. Microsoft PKI Services did not stop issuance when the initial certificate was discovered (2024-02-08). We did stop issuance from 1 of our 4 instances of our tools (on 2024-02-12 06:51), once we suspected the problem’s root cause was generated from that instance (see timeline). While we consider this a serious process failure, we are confident that we have identified and remediated the root cause of this incident. If our investigation indicates otherwise, we will stop issuance immediately. >In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem. We have identified 101 certificates impacted during this incident. The first certificate was issued on 2023-12-13 and the last one was issued 2024-02-10. There are no problems with the certificates themselves, however these certificates were not successfully published to our OCSP responder and were impacted by this incident. All certificates have now been published successfully and we confirmed that they have appropriate responses.

UPDATED Incident Report

Microsoft PKI Services: OCSP Responder does not know a Certificate

Incident Report (All reference times are in Pacific Time)

On 2024-02-08 08:58 MS PKI Services was notified by Ben Wilson at Mozilla that we had one certificate that was unknown to our OCSP. Ben provided a link to the tool (https://sslmate.com/labs/ocsp_watch/) and the error message with regard to the certificate provided by the tool.

We began our investigation and determined that we indeed had at least one Certificate that was unknown to our OCSP server. On 2024-02-08 ~16:00, we uploaded the certificate to our OCSP server and confirmed we received a proper OCSP response, and the error is no longer showing up in the OCSP Watch tool.

We continued investigating the incident, and discovered we had additional certificates that were unknown to our OCSP Responder. Our team suspected that something was wrong with 1 of our 4 instances of our tools that exist in two different data centers (we will call the suspicious instance “Z4”). The instance of our tools that we suspected, Z4, is new and went live for the first time on 2023-11-28. Within an hour of indications that Z4 was the root of the issue, we suspended the issuance of all certificates from Z4 on 2024-02-12 06:51.

We were able to finalize a list of all certificates that were impacted by this incident. We compared serial numbers of all active certificates (~22 million) against all serial numbers in OCSP. We identified 100 more certificates that were not known to our OCSP responder for a total of 101 certificates.

All certificates impacted were issued from Z4 sometime after 2023-11-28. The Z4 instance issued ~1.7 million certificates from 2023-11-28 until 2024-02-12 ~6:51 when it was taken out of production.

As of 2024-02-14 ~13:30 all impacted certificates had been uploaded to our OCSP Responder and were providing appropriate responses.

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date.

Notified by Mozilla as described above.

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Note: All times in Pacific Time (PT) (unless otherwise noted).

2023-11-28: Instance "Z4" of the MS PKI Tools went live in production and began issuing certificates.
2023-12-13 21:54:34 GMT: Final Certificate for Initial Problem Report was created (https://crt.sh/?sha256=5db268442e9e9b9919ee3ab0bc2e397e339972c8156ea901d002ac4508d7164a)
2024-02-08 08:58: MS PKI Services became aware that our OCSP did not know this certificate.
2024-02-08 ~18:00: MS PKI Services updated the OCSP server with the Final Certificate of Original Certificate that started the Problem Report and repaired the Unknown Status issue for this certificate.
2024-02-08 ~18:30: MS PKI Services verified that OCSP Watch Tool was no longer reporting this certificate not known to our OCSP server.
2024-02-12 06:51: Suspended operation of Z4 instance of tools (1 of 4 instances that issues certificates) as we suspected root cause of this incident came from this instance of our tools.
2024-02-12 ~18:00: Confirmed all problem certificates (101) out of ~22 million live certificates and that they all came from the suspicious instance of our tools, Z4, in the period since that instance went live.
2024-02-14 ~13:30: Confirmed that all 101 certificates without OCSP responses were successfully uploaded to our OCSP responder and were now responding.

Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

Microsoft PKI Services did not stop issuance when the initial certificate was discovered (2024-02-08). We did stop issuance from 1 of our 4 instances of our tools (on 2024-02-12 06:51), once we suspected the problem’s root cause was generated from that instance (see timeline). While we consider this a serious process failure, we are confident that we have identified and remediated the root cause of this incident. If our investigation indicates otherwise, we will stop issuance immediately.

In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

We have identified 101 certificates impacted during this incident. The first certificate was issued on 2023-12-13 and the last one was issued 2024-02-10. There are no problems with the certificates themselves, however these certificates were not successfully published to our OCSP responder and were impacted by this incident. All certificates have now been published successfully and we confirmed that they have appropriate responses.
https://crt.sh/?id=11405817732, https://crt.sh/?id=11461904506, https://crt.sh/?id=11463466679, https://crt.sh/?id=11463937816, https://crt.sh/?id=11463944792, https://crt.sh/?id=11464147843, https://crt.sh/?id=11464184580, https://crt.sh/?id=11464277182, https://crt.sh/?id=11464875346, https://crt.sh/?id=11549567945, https://crt.sh/?id=11669641690, https://crt.sh/?id=11694630711, https://crt.sh/?id=11709486571, https://crt.sh/?id=11711100026, https://crt.sh/?id=11711636778, https://crt.sh/?id=11713411012, https://crt.sh/?id=11714833254, https://crt.sh/?id=11716096130, https://crt.sh/?id=11716533623, https://crt.sh/?id=11720298011, https://crt.sh/?id=11722757169, https://crt.sh/?id=11724495715, https://crt.sh/?id=11797548593, https://crt.sh/?id=11797549593, https://crt.sh/?id=11800580359, https://crt.sh/?id=11857074627, https://crt.sh/?id=11862548823, https://crt.sh/?id=11865482922, https://crt.sh/?id=11879580760, https://crt.sh/?id=11909243461, https://crt.sh/?id=11909891626, https://crt.sh/?id=11912467249, https://crt.sh/?id=11913980309, https://crt.sh/?id=11915247463, https://crt.sh/?id=11920373286, https://crt.sh/?id=11920574085, https://crt.sh/?id=11923060690, https://crt.sh/?id=11925238097, https://crt.sh/?id=11925257384, https://crt.sh/?id=11930826596, https://crt.sh/?id=11935694574, https://crt.sh/?id=11937159164, https://crt.sh/?id=11940942112, https://crt.sh/?id=11941974330, https://crt.sh/?id=11941986382, https://crt.sh/?id=11942068681, https://crt.sh/?id=11942188173, https://crt.sh/?id=11942544933, https://crt.sh/?id=11943548222, https://crt.sh/?id=11943770278, https://crt.sh/?id=11946880353, https://crt.sh/?id=11947378687, https://crt.sh/?id=11947369370, https://crt.sh/?id=11947369375, https://crt.sh/?id=11947369255, https://crt.sh/?id=11947378534, https://crt.sh/?id=11947378984, https://crt.sh/?id=11947369268, https://crt.sh/?id=11947380329, https://crt.sh/?id=11947377669, https://crt.sh/?id=11948606087, https://crt.sh/?id=11954470142, https://crt.sh/?id=11959272112, https://crt.sh/?id=11964060717, https://crt.sh/?id=11966393224, https://crt.sh/?id=11969971757, https://crt.sh/?id=11971139584, https://crt.sh/?id=11974286740, https://crt.sh/?id=11982922086, https://crt.sh/?id=11983790859, https://crt.sh/?id=11985594608, https://crt.sh/?id=11988388633, https://crt.sh/?id=11988885876, https://crt.sh/?id=11988908250, https://crt.sh/?id=11989643030, https://crt.sh/?id=11990111261, https://crt.sh/?id=11993469782, https://crt.sh/?id=11994949823, https://crt.sh/?id=11996290122, https://crt.sh/?id=11997947051, https://crt.sh/?id=11998317363, https://crt.sh/?id=11998911202, https://crt.sh/?id=11999302426, https://crt.sh/?id=12001445042, https://crt.sh/?id=12001470412, https://crt.sh/?id=12002896757, https://crt.sh/?id=12003913882, https://crt.sh/?id=12009708680, https://crt.sh/?id=12010127458, https://crt.sh/?id=12010993433, https://crt.sh/?id=12020911490, https://crt.sh/?id=12021183108, https://crt.sh/?id=12022797995, https://crt.sh/?id=12025920832, https://crt.sh/?id=12026849620, https://crt.sh/?id=12027359303, https://crt.sh/?id=12031820583, https://crt.sh/?id=12032894619, https://crt.sh/?id=12036573030, https://crt.sh/?id=12036624025, https://crt.sh/?id=12038973100

In a case involving TLS server certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. It is also recommended that you use this form in your list "https://crt.sh/?sha256=[sha256-hash]", unless circumstances dictate otherwise. When the incident being reported involves an SMIME certificate, if disclosure of personally identifiable information in the certificate may be contrary to applicable law, please provide at least the certificate serial number and SHA256 hash of the certificate. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

See above for list of impacted certificates.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The root cause of this incident was twofold,

  • There is code in our tools that allows the certificate issuance workflow to continue and not alert when we experience an OCSP publishing failure. A configuration file in the tool has a setting to activate the code.
  • We mis-configured a new instance of our tools, called “Z4”, that went live on 2023-11-28, activating the code.

The incident resulted in a small percentage of certificates issued from Z4 not being present in OCSP Responders (see the 101 certificates impacted above).

Our workflow skipped verification after attempting to publish certificates to our OCSP responder. When publishing had occasional transient issues, the workflow continued to assume success since the verification was skipped. We will follow up with more detail on changing this logic in our Repair Items.

All other instances of our service were configured correctly with the additional checks, and we have not had any other failures in the publishing process (besides the 101 certificates in Z4 since late November 2023).

The Z4 instance issued approximately 1.7 million certificates from 2023-11-28 until 2024-02-12, when it was taken out of production. Of those certificates, 101 were not published correctly to our OCSP responder and resulted in this Incident Report (a 0.006% failure rate on this instance).

We did have a previous Bugzilla bug that we filed in Oct 2022 (https://bugzilla.mozilla.org/show_bug.cgi?id=1793443) that we wanted to address in this response. We have confirmed that the root cause of this incident (Feb 2024) is different than in the Oct 2022 incident. This incident (Feb 2024) came about because we had legacy code that allowed the workflow to bypass monitoring and alerting. So, even though we added monitoring and alerting in the previous incident (Oct 2022), this legacy code allowed those checks to be bypassed when we mis-configured a new instance of the tool.

List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

The team is working on a full list of repair items for this incident. We do not yet have all of them identified with committed dates. We will have an updated list within 7 days (by February 22).

We have already implemented the following Repair Items:

  • We have published the final certificates for all 101 impacted certificates to the OCSP Responders and confirmed we have good responses for all. (Complete).
  • We have updated our Deployment checklist and procedures to ensure we have adequate checks that this configuration is included in the setup of each instance of our tools. (Complete)

(In reply to John Mason from comment #3)

UPDATED Incident Report

Microsoft PKI Services: OCSP Responder does not know a Certificate

Incident Report (All reference times are in Pacific Time)

On 2024-02-08 08:58 MS PKI Services was notified by Ben Wilson at Mozilla
that we had one certificate that was unknown to our OCSP. Ben provided a
link to the tool (https://sslmate.com/labs/ocsp_watch/) and the error
message with regard to the certificate provided by the tool.

We began our investigation and determined that we indeed had at least one
Certificate that was unknown to our OCSP server. On 2024-02-08 ~16:00, we
uploaded the certificate to our OCSP server and confirmed we received a
proper OCSP response, and the error is no longer showing up in the OCSP
Watch tool.

We continued investigating the incident, and discovered we had additional
certificates that were unknown to our OCSP Responder. Our team suspected
that something was wrong with 1 of our 4 instances of our tools that exist
in two different data centers (we will call the suspicious instance “Z4”).
The instance of our tools that we suspected, Z4, is new and went live for
the first time on 2023-11-28. Within an hour of indications that Z4 was the
root of the issue, we suspended the issuance of all certificates from Z4 on
2024-02-12 06:51.

We were able to finalize a list of all certificates that were impacted by
this incident. We compared serial numbers of all active certificates (~22
million) against all serial numbers in OCSP. We identified 100 more
certificates that were not known to our OCSP responder for a total of 101
certificates.

All certificates impacted were issued from Z4 sometime after 2023-11-28. The
Z4 instance issued ~1.7 million certificates from 2023-11-28 until
2024-02-12 ~6:51 when it was taken out of production.

As of 2024-02-14 ~13:30 all impacted certificates had been uploaded to our
OCSP Responder and were providing appropriate responses.

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date.

Notified by Mozilla as described above.

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Note: All times in Pacific Time (PT) (unless otherwise noted).

2023-11-28: Instance "Z4" of the MS PKI Tools went live in production
and began issuing certificates.
2023-12-13 21:54:34 GMT: Final Certificate for Initial Problem Report
was created
(https://crt.sh/
?sha256=5db268442e9e9b9919ee3ab0bc2e397e339972c8156ea901d002ac4508d7164a)
2024-02-08 08:58: MS PKI Services became aware that our OCSP did not
know this certificate.
2024-02-08 ~18:00: MS PKI Services updated the OCSP server with the
Final Certificate of Original Certificate that started the Problem Report
and repaired the Unknown Status issue for this certificate.
2024-02-08 ~18:30: MS PKI Services verified that OCSP Watch Tool was no
longer reporting this certificate not known to our OCSP server.
2024-02-12 06:51: Suspended operation of Z4 instance of tools (1 of 4
instances that issues certificates) as we suspected root cause of this
incident came from this instance of our tools.
2024-02-12 ~18:00: Confirmed all problem certificates (101) out of ~22
million live certificates and that they all came from the suspicious
instance of our tools, Z4, in the period since that instance went live.
2024-02-14 ~13:30: Confirmed that all 101 certificates without OCSP
responses were successfully uploaded to our OCSP responder and were now
responding.

Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

Microsoft PKI Services did not stop issuance when the initial certificate
was discovered (2024-02-08). We did stop issuance from 1 of our 4 instances
of our tools (on 2024-02-12 06:51), once we suspected the problem’s root
cause was generated from that instance (see timeline). While we consider
this a serious process failure, we are confident that we have identified and
remediated the root cause of this incident. If our investigation indicates
otherwise, we will stop issuance immediately.

In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

We have identified 101 certificates impacted during this incident. The first
certificate was issued on 2023-12-13 and the last one was issued 2024-02-10.
There are no problems with the certificates themselves, however these
certificates were not successfully published to our OCSP responder and were
impacted by this incident. All certificates have now been published
successfully and we confirmed that they have appropriate responses.

Apologies, please ignore Comments 3 and 5 they are incomplete.

Comment 4 is complete.

Sorry for the duplicate postings, I had trouble with adding certificate links and some duplications happened.

Hi, are Microsoft intending to migrate across to the new reporting template for future incidents?

Malcolm thank you for the question in Comment 7. You are correct, we should have responded to this incident with the newer Incident Report template provided by CCADB. This was an oversight on my part. I have updated our internal Incident Process with the newer template, and we will respond with it in future incidents and from now on.

Action Items/Repair Items

Here is an updated list of Repair Items for this incident. We will provide an update on progress in two weeks (by March 8).

Action Item Kind Due Date
Published the final certificates for all 101 impacted certificates to the OCSP Responders and confirmed we have good responses for all. Correction Complete 2024-02-14
Updated our Deployment checklist and procedures to ensure we have adequate checks that this configuration is included in the setup of each instance of our tools Correction Complete 2024-02-14
Validate configuration flags and remove any setting that should not be a configurable option Correction 2024-03-28
Enhance monitoring/alerts for Publishing failures to OCSP Responder Detection 2024-03-28

Our team is working on the Repair Items updated above and are still on track to deliver them all by 2024-03-28. We will provide our next update on progress in two weeks (by 2024-03-22).

Our team is working on the Repair Items updated above and are still on track to deliver the open Repair Items by 2024-03-28. We will provide our last update on progress next week (by 2024-03-28).

Our team has Completed all Repair Items. With that work completed, we respectfully request for this bug to be closed.

Action Items/Repair Items

Action Item Kind Due Date
Published the final certificates for all 101 impacted certificates to the OCSP Responders and confirmed we have good responses for all. Correction Complete 2024-02-14
Updated our Deployment checklist and procedures to ensure we have adequate checks that this configuration is included in the setup of each instance of our tools Correction Complete 2024-02-14
Validate configuration flags and remove any setting that should not be a configurable option Correction Complete 2024-03-27
Enhance monitoring/alerts for Publishing failures to OCSP Responder Detection Complete 2024-03-27

I am flagging this for closure on Friday, 29-Mar-2024, unless there are issues to discuss.

Flags: needinfo?(bwilson)
Status: REOPENED → RESOLVED
Closed: 2 years ago1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: