Closed Bug 1848280 Opened 1 year ago Closed 1 year ago

Microsoft PKI Services: 3-Month Access Review Process Failure

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dustin.Hollenback, Assigned: Dustin.Hollenback)

Details

(Whiteboard: [ca-compliance] [policy-failure] Next update 2023-10-23)

Preliminary report

A Microsoft PKI Services engineer identified that a user account had been provisioned for an employee that is not in a Trusted Role. That investigation also identified a separate problem related to 3-month access reviews which did not identify this user account creation for a person not assigned to a Trusted Role. This failed to meet Network Security Requirements Section 2.j.: “Review all system accounts at least every three (3) months and deactivate any accounts that are no longer necessary for operation”.

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date.

This problem was self-identified while investigating another issue and it became clear on 2023-08-10 at 16:15 Pacific time that the 3-Month Access Review Process failed to identify this access issue.

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Note: All times in Pacific Time (PT).

2022-11-22 13:54: User account created for Non-Trusted Role user
2022-12-28: Access review performed
2023-04-11: Access review performed
2023-06-20: Access review performed
2023-08-09 13:09: Trusted Role Engineer performed random audit and discovered Non-Trusted Role user account within High Security Zone and opened internal Incident
2023-08-09 13:15: User account for Non-Trusted Role user was deleted
2023-08-10 16:15: Investigation identified 3-Month Access Review process problem

Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

Microsoft PKI Services has not stopped certificate issuance. While we consider this a serious process failure by not identifying a user account for a user not assigned to a Trusted Role, we are confident that the environment was secure. The user who was granted access is a Microsoft employee who regularly works on internal CA systems, but was not assigned to a Trusted Role and restricted from our High Security Zone only for the purposes of limiting access.

In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

Certificates were not impacted by this process failure. We identified a problem with our access review process.

In a case involving TLS server certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. It is also recommended that you use this form in your list "https://crt.sh/?sha256=[sha256-hash]", unless circumstances dictate otherwise. When the incident being reported involves an SMIME certificate, if disclosure of personally identifiable information in the certificate may be contrary to applicable law, please provide at least the certificate serial number and SHA256 hash of the certificate. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

Certificates were not impacted by this process failure.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We are still investigating and expect to have a complete report within one week.

List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

We are still investigating the root causes and remediation steps to prevent repeating this issue in the future.

Assignee: nobody → Dustin.Hollenback
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [policy-failure]

Incident Report

A Microsoft PKI Services engineer identified that a user account had been provisioned for an employee that is not in a Trusted Role. That investigation also identified a separate problem related to 3-month access reviews which did not identify this user account creation for a person not assigned to a Trusted Role. This failed to meet Network Security Requirements Section 2.j.: “Review all system accounts at least every three (3) months and deactivate any accounts that are no longer necessary for operation”.

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date.

This problem was self-identified while investigating another issue and it became clear on 2023-08-10 at 16:15 Pacific time that the 3-Month Access Review Process failed to identify this access issue.

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Note: All times in Pacific Time (PT).
2022-11-22 13:54: User account created for Non-Trusted Role user
2022-12-28: Access review performed, but Non-Trusted Role user was not found in report
2023-04-11: Access review performed, but Non-Trusted Role user was not found in report
2023-05-09 13:40: Report data bug mitigated which resolved missing users from access review report
2023-06-20: Access review performed, and Non-Trusted Role user was found in report
2023-08-09 13:09: Trusted Role Engineer performed random audit and discovered Non-Trusted Role user account within High Security Zone and opened internal Incident
2023-08-09 13:15: User account for Non-Trusted Role user was deleted
2023-08-10 16:15: Investigation identified 3-Month Access Review process problems
2023-08-17 12:04: Updated manual 3-Moth Access Review process for Secure Zone and High Security Zone to include an explicit check for Trusted Role group membership

Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

Microsoft PKI Services has not stopped certificate issuance. While we consider this a serious process failure by approving a user who has not been formally assigned to a Trusted Role, we are confident that the environment was secure. The user who was granted access is a Microsoft employee who regularly works on internal CA systems but was not formally assigned to a Trusted Role and restricted from our Secure Zone.

In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

Certificates were not impacted by this process failure. We identified problems with our access review process.

In a case involving TLS server certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. It is also recommended that you use this form in your list "https://crt.sh/?sha256=[sha256-hash]", unless circumstances dictate otherwise. When the incident being reported involves an SMIME certificate, if disclosure of personally identifiable information in the certificate may be contrary to applicable law, please provide at least the certificate serial number and SHA256 hash of the certificate. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

Certificates were not impacted by this process failure.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

From the time that we could have detected the Trusted Role Control failure, starting 2022-11-22, until we detected and removed the mistaken access (on 2023-08-09). We performed three (3) 3-month access reviews.

The first two (2) reviews (2022-12-28 and 2023-04-11) did not detect the mistaken user account at all. The reason for that is that the access data that is collected in an automated way had been filtered and the filter had a bug that hid some users from the report. This bug that hid some users from this report was identified and mitigated on 2023-05-09 and all future reports showed all users.

The last 3-month review (2023-06-20) revealed the mistaken user account but the reviewer recognized the user as someone that was expected to have access to the Secure Zone without ensuring that the user was explicitly listed on the Trusted Role group list. The documented manual process did not include a step to explicitly check for inclusion in the Trusted Role group. The reviewer did note that the person had been granted access to provide a time limited deployment function and should retain access for that purpose.

During this review it also became clear that we also missed the requirement for these reviews to be completed at least every 3 months (2 of these reviews met that criterion, 1 did not), therefore we will also take action to ensure that we can comply with that control.

List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

On 2023-08-17 we updated the manual 3-Month Review processes for our Secure Zone and High Security Zone to include a check of the Trusted Role group list. This should ensure that all future manual checks validate that the user is a member of the Trusted Role group and take appropriate action in the event of failure.

We will update our manual 3-Month Review processes for our Secure Zone and High Security Zone to a tighter timetable and clarify that they MUST be performed at least every 90 days. We will have the processes updated and perform another 3-month review by 2023-08-25.

We will review and as appropriate further update the automation that collects user accounts for 3-Month Access Reviews to present unfiltered data that will ensure all accounts with access are listed. Will have a committed date for this by 2023-08-25.

We will automate verification of all user accounts that have access to the Secure Zone and High Security Zone to ensure they are a valid user in the Trusted Role group and take appropriate action in the event of failure. The date for completion of this automation will be provided by 2023-08-25.

Whiteboard: [ca-compliance] [policy-failure] → [ca-compliance] [policy-failure] Next update 2023-08-25

The manual 3-Month Review process has been updated to be performed at least every 90 days.

We are still working on the automation to collect user accounts for 3-Month Access Reviews and expect this to be complete next week.

We have not been able to lock on the final scope of work for the Secure Zone member verification automation and will provide a better estimate next week with a date we can commit to completing the work.

Since our last update, we were able to close a task and identify commitment dates for some of the open work to ensure this issue does not recur.

We reviewed the automation that collects user accounts for 3-Month Access Reviews on 2023-08-29 and verified it provides unfiltered data that will ensure all accounts with access are listed.

We will automate verification of all user accounts that have access to the Secure Zone to ensure they are a valid user in the Trusted Role group and take appropriate action in the case of a failure. This will be implemented by 2023-10-13.

Whiteboard: [ca-compliance] [policy-failure] Next update 2023-08-25 → [ca-compliance] [policy-failure] Next update 2023-10-23

MS PKI Services completed the centralized management of the Trusted Role group list which has allowed automation to replace several manual processes for comparing with that list. This included adding automation to verify all users with access to the Secure Zone are listed within an appropriate Trusted Role.

With that work completed, we respectfully request for this bug to be closed.

I intend to close this on Wed. 11-Oct-2023.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.