Closed Bug 1931413 Opened 9 months ago Closed 8 months ago

Google Trust Services: New hire onboarding deviation from written procedure

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gts-external, Assigned: gts-external)

Details

(Whiteboard: [ca-compliance] [policy-failure])

During the onboarding process for a new hire, it was identified that membership to some access groups was granted in a different order than our procedures specify. The access group membership did not result in access to certificate systems, however since it was a deviation from our documented procedures, we are preparing an incident report to explain the situation and the changes we plan to make.

Google Trust Services is investigating the issue and will publish a full report by 2024-11-22.

Assignee: nobody → gts-external
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [policy-failure]

Incident Report

Summary

During the onboarding process for a new hire, we identified that some access control group memberships were granted in a different order than our procedures specify which resulted from multiple process deviations. The applicable control says: "Personnel in Trusted Roles receive training as relevant to their job duties and must pass an examination. The training is delivered during onboarding and repeated annually. Its content covers the policies and procedures applicable to the respective job role. Personnel who perform validation functions must complete the training before receiving access to CA systems." In this case, training was complete but the examination had not been completed before membership to access control groups was granted. Membership in these groups is necessary but not sufficient alone to gain access to CA systems.

Current process

To gain access to CA systems, a new hire to Google Trust Services (GTS) must go through the below series of steps:

  1. The new hire undergoes a training program that culminates with an examination.

  2. Once the new hire passes the examination, they submit requests to be added to access control groups, then file an access ticket referencing those requests. The access ticket is submitted to our Policy Authority (PA) who are the only ones authorized to approve the access control group membership requests in the access ticket.

  3. The PA validates the access ticket and that the requirements have been met and then either approves or rejects it. In the case of rejection, the process stops. Otherwise, after the access ticket has been approved, the process continues.

  4. The PA approves the access control group requests referenced in the ticket.

  5. The new hire creates a changelist (CL) to add a unique user account for themselves on CA systems. Once the CL is approved, the change is scheduled for deployment to production during the next rollout, which typically occurs on a weekly basis.

  6. After a rollout deploying the new user account has completed, the new hire is capable of accessing CA systems.

As described above, gaining access to CA systems within the GTS CA environment requires both access control group grants and a configuration deployment. Without both sets of changes in place, access to CA systems is not possible.

Process deviations

In this case, we had 3 deviations from our procedures:

  1. Membership to access control groups was requested and approved after training but before the examination was complete.

  2. The access ticket was created and approved after training but before the examination was complete.

  3. The access ticket was approved by a team manager who is authorized to approve standard access requests but not new hire access requests, which must be approved by the PA.

Issue detection

Upon completion of the examination by the new hire, a team member noticed that approvals had occurred out of sequence. Checks were performed and we verified that the new hire had not been granted access to CA systems between the initial approval and the time the issue was recognized. A bug was created to improve process documentation and consider additional controls.

Privileges granted

As noted earlier, membership in the access control groups alone does not confer access to CA systems. CA system access requires a rollout to occur that includes a configuration change to add a unique account per user. This configuration change was neither submitted nor rolled out, thus the new hire could not have gained access to CA systems.

All access to CA systems is gated with Multi-Party Authorization (MPA). Membership in the access control groups does confer the ability to view logs and approve (MPA) requests. We confirmed the new hire did not exercise these privileges through a review of our audit logs.

Impact

There was no impact to certificate issuance or validation. The out of sequence access control group membership grants did not result in premature access to CA systems. Full audit logs were available so we were able to confirm that no access was attempted and we have confirmed that there was no way to utilize the memberships that were granted out of sequence to elevate privileges or access CA systems.

Timeline

All times are UTC.

2024-06-24

  • New hire’s first day at Google. They start their 3-week company-wide Google Technical Immersion program.

2024-07-17

  • 11:30 New hire begins GTS’ onboarding process.

2024-08-13

  • 13:30 New hire completes GTS training.

2024-08-18

  • 22:42 New hire requests membership in access control groups.

2024-08-19

  • 15:19 Membership requests in access control groups are approved by the PA.
  • 16:28 New hire creates access ticket to formally track their addition to access control groups.
  • 17:18 Access ticket is approved by a team manager who is authorized to approve standard access requests but not new hire access requests, which must be approved by the PA.

2024-09-04

  • 12:40 New hire completes examination.
  • 12:56 Team member notices that group membership was granted prematurely.
  • 14:23 One engineering Technical Lead and two members of the PA discuss the sequencing issue and determine that no access to CA Systems was given or exercised. It was considered an internal process issue and an action item to file a bug to improve the process was assigned.

2024-09-09

  • 15:55 A bug is created to investigate improvements to the onboarding processes and controls with details about the out of sequence approvals.

2024-09-12

  • 16:09 A second access ticket is created in accordance with the documented procedure.
  • 16:24 The second access ticket is approved by an authorized member of the PA.

2024-09-25

  • 09:00 A CL adding the user account to access CA systems is submitted.

2024-09-26

  • 12:29 Beginning of the rollout deploying the CL adding the user account to CA systems.

2024-09-27

  • 15:42 End of the rollout deploying the CL adding the user account to CA systems.

2024-11-06

  • 19:58 During work on the previously created process improvement bug, the fact this was more of a near miss than initially thought became clear. Based on this observation, further work is done to assess whether our initial assessment that it was only an internal process issue needed to be reexamined.

2024-11-07

  • 15:11 A review of audit logs concludes, an initial timeline is created and both are shared with the PA. The PA begins discussing the issue with the additional new information.

2024-11-07 through 2024-11-13

  • The PA requests feedback and clarifications from team members.

2024-11-13

  • 19:44 The PA makes a determination to file a public incident to detail our learnings to help improve our processes and provide details that could help other CAs.
  • 20:15 Two team members draft a preliminary report.

2024-11-14

  • 19:31 GTS files this incident on Bugzilla.

Root Cause Analysis

The complexity and sequencing of the onboarding process allowed this issue to happen. It consists of several independent steps that include:

  • The new hire files their own access requests. Since they may lack strong familiarity with the process and tools. It would be best if the proctor files the access requests following the examination.
  • Multiple manual checks and approvals that can be done by different approvers in different tools.
  • The approval process did not hinge on an initial approval mechanism.

Based on the way our logical access management procedure is written, the decision to grant access is intended to hinge on the approval of the access ticket, but the steps were not executed in that order. A simpler process with correct ordering and fewer steps will reduce the likelihood of a mistake being made. Making the decision to grant privileged access requires some manual review, but additional manual work must also be limited.

Lessons Learned

What went well

  • The deviation was identified.
  • Our split access model (code changes + access control changes) prevented privileged access from being usable prematurely.

What didn't go well

  • We had expanded some approval groups to allow for faster responses. In this case, an expanded group had approval rights over an onboarding step that should have been restricted to PA approval, resulting in a process failure.
  • Our onboarding process could have had additional controls to better enforce process adherence.
  • Though this was identified quickly and had a timely initial investigation, the initial evaluation deemed this an internal process issue and the subsequent investigation was not prioritized.

Where we got lucky

N/A

Action Items

Action Item Kind Due Date
Revise our access grant process so the exam proctor files post-exam access requests Prevent 2024-12-06
Re-evaluate and reduce the number of steps, approvals, and approvals systems necessary for onboarding new team members Prevent 2024-12-20
Simplify the sequencing of steps for onboarding new team members Prevent 2024-12-20

Appendix

Details of affected certificates

N/A - This was a procedural deviation and no certificates / issuance processes were affected.

GTS will monitor this bug for any questions or comments and will plan to provide an update on Action Items by 2024-12-06.

GTS continues to monitor this discussion for comments and questions. We kindly request that the "Next update" field be set to 2024-12-06 when our next AI is due.

Flags: needinfo?(bwilson)
Type: defect → task
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [policy-failure] → [ca-compliance] [policy-failure] Next update 2024-12-06

Google Trust Services (GTS) has completed the first Action Item below that was scheduled to be completed by 2024-12-06. GTS revised our access request procedure so that the exam proctor is the one to request access on behalf of a new hire. This change also improves the sequencing of the process so it is less likely an access request is approved prematurely. Additional work continues on other improvements.

GTS will continue to monitor this discussion for questions and comments and will provide another update by 2024-12-20 on the remaining two Action Items.

Action Items

Action Item Kind Due Date
Revise our access grant process so the exam proctor files post-exam access requests Prevent 2024-12-06
Re-evaluate and reduce the number of steps, approvals, and approvals systems necessary for onboarding new team members Prevent 2024-12-20
Simplify the sequencing of steps for onboarding new team members Prevent 2024-12-20
Whiteboard: [ca-compliance] [policy-failure] Next update 2024-12-06 → [ca-compliance] [policy-failure] Next update 2024-12-23

Google Trust Services completed the remaining action items promised in our incident report. We completed changes to reduce manual steps involved in requesting and approving access to systems while mitigating risks related to the correct sequencing of the steps and premature access approvals. Automation to manage access group membership was tested and is now active, as are improvements to the training process.

To gain access to CA systems, a new hire to Google Trust Services must first satisfy three key prerequisites:

  1. The new hire undergoes a training program that culminates with a final training. Upon completion of the training, the trainer submits a request to add the new hire to a new access control group used to signify the training is complete. Group membership expires in 365 days. This new group does not confer access to any systems.

  2. The new hire completes an examination, which is now provided by a formal assessment system. This was formerly conducted orally. Examination results expire in the assessment system after 12 months.

  3. The new hire acknowledges they have complete onboarding requirements and procedures through a web-based system. This was formerly written in an access ticket, which is no longer used.

Once the prerequisites have been met:

  1. An automated system used for access control management verifies that the prerequisites have been met. The access control management system submits a request to add the engineer to the access control groups. After process improvements made for this bug, fewer access control groups are used than before. These requests are sent to our Policy Authority (PA) who are the only individuals authorized to approve the requests.

  2. The PA validates the access control group requests by reviewing a report linked in each request, and then either approves or rejects them. In the case of rejection, the process stops. Otherwise, after the requests have been approved, the process continues.

  3. The new hire creates a changelist (CL) to add a unique user account for themselves on CA systems. The CL reviewer is required to confirm that membership into a key access group was approved before approving the CL. Once the CL is approved, the change is scheduled for deployment to production during the next rollout, which typically occurs on a weekly basis.

  4. After a rollout deploying the new user account has completed, the new hire is capable of accessing CA systems.

Another benefit of using this automated system is that it enforces training renewals. When membership in the group signifying training completion is nearing expiration, the system sends warnings to remind the engineer to retake the training. If membership in the group elapses or the examination is not successfully retaken, access control group membership removal requests are automatically sent to the Policy Authority, who may remind the engineer to retake their training promptly or revoke their access.

This completes all action items.

Action Item Kind Due Date
Revise our access grant process so the exam proctor files post-exam access requests Prevent 2024-12-06
Re-evaluate and reduce the number of steps, approvals, and approvals systems necessary for onboarding new team members Prevent 2024-12-20
Simplify the sequencing of steps for onboarding new team members Prevent 2024-12-20

If there are no comments or questions, we kindly request for this bug to be closed.

Flags: needinfo?(bwilson)

Hi,

Thanks for this great final update, but could you also post a very brief closing summary?

A closing summary should briefly:

  • describe the incident, its root cause(s), and remediation;
  • summarize any ongoing commitments made in response to the incident; and
  • attest that all Action Items have been completed.

Here is a markdown template:

Incident Report Closure Summary

  • Incident Description: [Two or three sentences summarizing the incident.]
  • Incident Root Cause(s): [Two or three sentences summarizing the root cause(s).]
  • Remediation Description: [Two or three sentences summarizing the incident's remediation.]
  • Commitment Summary: [If applicable, a few sentences summarizing ongoing commitments made in response to this incident.]

Thanks again,

Ben

Whiteboard: [ca-compliance] [policy-failure] Next update 2024-12-23 → [ca-compliance] [policy-failure]

Ben, here is the requested Incident Report Closure Summary

Incident Description:

During the onboarding process for a new hire, we identified that some access control group memberships were granted in a different order than our procedures specify which resulted from multiple process deviations. Other controls prevented any premature access, but there were opportunities for improvement.

Incident Root Cause(s):

The complexity and sequencing of the onboarding process were the root cause. It was expected that events happen in a certain order. There were controls to prevent premature access, but a lack of controls to enforce the documented procedure.

Remediation Description:

Three changes were implemented to remediate the process and tooling improvements that were identified:

  1. GTS revised our access grant process so the exam proctor files post-exam access requests
  2. GTS re-evaluated and reduced the number of steps, approvals, and approvals systems necessary for onboarding new team members
  3. GTS simplified the sequencing of steps for onboarding new team members

Commitment Summary:

No specific ongoing commitments, beyond the completed action items, were made in response to this incident. GTS remains committed to our continual improvement efforts and we will implement any additional improvement opportunities we identify.

Flags: needinfo?(bwilson)

GTS is monitoring this for further questions and comments. If there are none, we kindly request that this be closed out.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.