Closed Bug 1693930 Opened 3 years ago Closed 3 years ago

Microsoft PKI Services: Policy Documentation, Failure to update Subscriber Certificate Max Validity Period

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: johnmas, Assigned: johnmas)

Details

(Whiteboard: [ca-compliance] [policy-failure])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4407.0 Safari/537.36 Edg/90.0.789.1

Type: defect → task
  1. How your CA first became aware of the problem.

Microsoft PKI Services has identified an issue with our failure to update our “Microsoft PKI Services Certification Practices Statement v3.1.7”.

We noticed during internal reviews that in Section 6.3.2, regarding Subscriber Certificate Maximum Validity Periods, that we had not updated the Max Validity Period to reflect the change to the Baseline Requirements on September 1, 2020.
We became aware of this problem in the process of creating a new CPS that we are working on, this was first discovered on February 4, 2021 by the Microsoft PKI Services team.

  1. A timeline of the actions your CA took in response

A. 2021-Feb-04 - We became aware of this issue in our existing Services CPS v3.1.7, while preparing and reviewing a new CPS, Third Party CPS v1.0.0, that Microsoft PKI Services will publish in the near future.
B. 2021-Feb-05 - We confirmed that our certificate issuance processes related to this CPS are compliant with the updated max validity period of 398 days. We did this by verifying all our Subscriber Certificate Templates are configured as such. During the review of our Subscriber Certificate Templates and their corresponding change logs, we determined that since started to issue Certificates related to this CPS, our certificate validity period has never been more than one year.
C. 2021-Feb-10 - We reviewed and verified that we have not issued any certificates since the BR change in September 2020 with a longer validity period than allowed under the updated BR standard (398 days).
D. 2021- Feb-15 - We finalized a new version of the CPS that is currently in the process of review and approval with our Policy Authority.

  1. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.

We never stopped certificate issuance related to the issue, because we were able to verify that our underlying processes followed the BR’s.

  1. In a case involving certificates, a summary of the problematic certificates.

We have not discovered any problematic certificates related to this incident.

  1. In a case involving certificates, the complete certificate data for the problematic certificates.

Not applicable at this point.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

Updating the Max Validity period in the documentation to match the new requirements was missed during the periodic review of our CPS in July 2020. We had identified this change and confirmed that our underlying processes were compliant with the Max Validity Period prior to September 1, 2020.

We believe we need to improve our ability to synchronize the updates to our systems and the updates to our Policy Documents. We believe this is the main repair item for us related to this issue.

  1. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Further mitigation steps that are planned:

  • Post an updated CPS to our Repository, v3.1.8 with this issue corrected (expected by February 26, 2021).
    
  • Review and update our Policy Documents Review procedures and practices to prevent similar documentation mistakes in the future.
    

Thanks, John, for taking proactive steps in reporting this and working to resolve it.

Assignee: bwilson → johnmas
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

Microsoft PKI Services has posted an Updated CPS to our Repository (https://www.microsoft.com/pkiops/docs/repository.htm), v3.1.8 with this issue corrected per the discussion above.

We are still working on a complete review and update of our Policy Document Review procedures and practices and will report back on the completion of this work.

Microsoft PKI Services has reviewed and updated our procedures for Policy Document Reviews. Our updated process more formally tracks changes to CABForum requirements and Root Store Policies using a ticketing system (in our case, Azure DevOps). These tickets are triaged formally by a cross discipline team (Dev, PM, and Operations) every two weeks for changes to policy. Additionally, the team ensures that any necessary changes to policy are put in place (tracked formally) and monitors their implementation for compliance with any associated implementation dates.

With the implementation of this update, we ask that this bug be resolved.

I appreciate Microsoft's confidence that this issue would be prevented again, but I'm not sure I really see an explanation in Comment #1 / Comment #4 about where/how things went wrong. That is, the incident report in Comment #1 mostly seems to focus on the symptoms (the out of date CP/CPS), but without the discussion of the root cause.

Comment #4 suggests that new processes in place will prevent that, but it's unclear if these are entirely new processes or modifications of existing processes. If they're entirely new, why weren't such processes in place before, and are there other gaps that might exist in terms of things expected of every CA. We've seen some confusion from Microsoft in the past (e.g. Bug 1711147, Comment #6 talks about update expectations, Bug 1705419, Comment #6 talked about pre-issuance linting, Bug 1670337 talked heavily about CAA), and so it's natural to try to figure out if there are other gaps that can be identified before causing an incident. Similarly, if this is a modification of existing procedures, then it's useful to understand what those procedures were, and how they changed (Similar to the feedback on Bug 1700809, Comment #10)

Flags: needinfo?(johnmas)

Thanks for the clarifying questions Ryan, they are always appreciated.

The root cause of this issue was that a manual review process missed that we did not update our CPS when we did update the underlying processes (and were in compliance with the BRs by the September 1, 2020 implementation date). Our process did identify that we need to change in response to a change in industry requirements (BRs) and we did update our process, but not our Policy Documents. Our process clearly missed this particular requirement (and did not update the CPS at the time). Additionally, when we attempted to discover how this happened there was not enough evidence left over from what was our process at the time to figure out exactly went wrong.

That is why our updated process has more formality for tracking industry changes (be they CAB Forum or Root Store Policy) via ticketing systems, so that we can more precisely track each change and each step in our process. Additionally, we have formed a v-team inside our group that will increase the number of eyeballs that are reviewing each change and help prevent issues like this in the future. This v-team will have review gates for each industry change to ensure we have implemented the changes into our policy documents and processes and met appropriate implementation dates.

This new process is an iteration of our existing processes, we have not started from scratch. The older process was less formal, not tracked via a ticketing system, but instead using checklists in OneNote. The older process involved less folks on the team and did not necessarily include all disciplines for review and approval of changes.

Flags: needinfo?(johnmas)

Thanks. Comment #6 makes it much clearer about where and how things went wrong, and how the changes are specific to addressing that.

Sending to Ben to see if he has further questions.

Flags: needinfo?(bwilson)

I don't have any further questions at the present.

Flags: needinfo?(bwilson)

With this feedback can we please resolve the bug?

I will calendar this for closure to be done on or about this Friday, 16-July-2021.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [policy-failure]
You need to log in before you can comment on or make changes to this bug.