Closed Bug 1729097 Opened 3 months ago Closed 1 month ago

Google Trust Services: Delayed publication of CPS removing DNS Operator Exception

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: offline, Assigned: offline)

Details

(Whiteboard: [ca-compliance] Next update 2021-09-24)

On 2021-08-31 a member of the Mozilla dev-security-policy mailing list identified a delayed publication of updates to our CPS to align it with behavior changes that were deployed to ensure compliance with CABF Ballot SC46.

A full report will be posted in the coming days.

Assignee: bwilson → offline
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

1. How your CA first became aware of the problem

During public discussion about bug 1675821, a member of the community identified a delayed publication of updates to our CPS to align it with behavior changes that were deployed to ensure compliance with CABF Ballot SC46.

2. A timeline of the actions your CA took in response.

YYYY-MM-DD (UTC) Description
2017-03-08 00:00 UTC Ballot 187 enters into effect mandating CAA checking with the exception of DNS Operators.
2021-04-29 17:54 UTC Change to remove the DNS Operator Exception for CAA is flagged by the Policy Authority for review by Engineering.
2021-05-05 12:31 UTC Work on a full review and update of our CPS begins, which includes removal of the DNS Operator Exception.
2021-05-13 20:58 UTC Change submitted to remove DNS Operator Exception from our systems.
2021-06-15 23:59 UTC Delivery of automation items related to Bug 1708516
2021-06-22 10:32 UTC New CPS version is sent for review including the DNS Operator Exception removal.
2021-07-12 00:00 UTC Ballot SC46 on subsetting the CAA Exception for DNS Operators enters into effect.
2021-08-11 00:00 UTC New CPS gets final approval and is published.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.

We did not stop issuance as the issuance practices in use were covered by the CPS.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.)

We have reviewed all issuance that took place since the code changes supporting SC46 entered into effect (2021-05-13) through the publication of the CPS (2021-08-11) and in 100% of those cases SC46-compliant CAA checks were successfully performed.

5. In a case involving certificates, the complete certificate data for the problematic certificates.

N/A

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

On 2017-03-08, ballot 187 entered into effect making CAA checking mandatory for all CAs. The ballot included an optional allowance for skipping CAA checking if the CA or an affiliate of the CA is the domain's DNS Operator. We historically used this provision to skip CAA checking for a small subset of certificates that we issued where Google was the DNS operator.

A discussion to remove allowance for this method was initiated in the CA/B Forum, and as a result ballot SC46 was proposed. Our change review process flagged this for engineering review on 2021-04-29, and we subsequently removed the option to use the exception. The method was last used 14 days later on 2021-05-13, which was 60 days before the ballot changes went into effect.

On 2021-05-05 we began our annual CPS update, which bundled an update to Section 4.2.4 to remove the reference to the CAA checking exception for the DNS Operator, along with several other updates.

We also had multiple incidents open at that time (Bug 1708516, Bug 1706967, Bug 1709223, Bug 1715421, Bug 1652581) and were making process improvements detailed in Bug 1708516 Comment 44. These updates all involved a number of additional partner teams and stakeholders for reviews with the goal of ensuring we covered all CPS updates and improvements in one pass. These factors combined resulted in delays reviewing and publishing an updated version of our CPS.

When SC46 was published on 2021-06-02, it triggered another review, the conclusion of which was that the change had been addressed on 2021-05-13. This determination was made based on the records captured in the bug tracking the associated code changes and deployment which had been marked fixed, and because the CPS was in the process of being approved for release.

The aforementioned manual process to approve updates to our CPS resulted in further delays in its publication, and the impending deadline to publish an update for SC46 was not flagged because we lacked a control to alert us of the CPS change deadline, and insufficient controls in place to coordinate related code changes to documentation changes. This was the root cause of this issue.

On review of past incidents this may appear to be related to Bug 1706967, however that incident was the result of a scheduling issue that prevented a full review of BR changes from being completed. In the case of this issue, the review was completed in a timely manner but the CPS changes had not yet been approved due to factors mentioned earlier.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

To prevent this issue from occurring again in the future, we plan to introduce changes so updates to our CPS are published by the time new requirements become effective.

To that end:

  • We will further update our processes to require that the tracking bug for compliance documentation revisions includes a machine parseable publication deadline of the effective date of any requirements changes, and require that this bug be linked to the tracking bug for any software change. This process change will be implemented immediately.
  • We will leverage our ticketing system to proactively alert of impending deadlines for documentation updates. This change will be implemented by 2021-09-17
  • We will conduct another internal review of all CAB/F ballots since SC3 to double check that all required changes have been made and are accurately reflected in our CPS. This will be completed by 2021-09-24.
Whiteboard: [ca-compliance] → [ca-compliance] Next update 2021-09-24

Based on a response from a community member to the dev-security-policy mailing list, it seems there is an opportunity to clarify what the aforementioned action items accomplish.

We also want to stress that we believe the gap between making code changes and publishing CPS updates to reflect relevant changes should be as small as possible, whether those changes are driven by requirements changes or not.

Specifically:

1) By linking a tracking bug for documentation updates to the tracking bug for software updates, the work will not be tracked as delivered until all related bugs are closed.

Since our engineering processes are designed to encourage rapid completion of open bugs, this provides an impetus for engineers to ensure the related policy and documentation work is driven to not only completion but also publication.

Additionally the bug tracking system ensures blocking bugs tied to it are also closed before the dependent bug can be closed further supporting this goal.

2) We would also like to clarify that the deadlines mentioned in the second action item were not intended to be limited by the effective date of BR/MRSP changes. We may, for example, set deadlines to deliver changes in advance of the effective date.

3) Specifically the plan is to associate the deadline for the associated tracking bug to the planned deployment date.

For CP/CPS changes that aren’t motivated by BR/MRSP changes, the same tracking method will be used. As an example, the addition of another CAA Issuer Domain Name to our CP/CPS would be prepared in draft form in documentation at the same time as a code change, and the documentation tracking deadline would typically be updated to reflect the deadline to deliver the code change. Therefore the CPS will be updated within a reasonable timeframe.

As part of this change we will also leverage an existing review process to ensure that when new bugs are entered into our tracking system, bugs with compliance or CP/CPS disclosure relevance are identified during analysis, resulting in creation of a documentation update tracking bug as previously described.

With all that said, as it relates to this incident, we would like to emphasize that in this particular case both the code change to remove the runtime allowance of the allowed behavior and the supporting CPS changes were drafted and in review and followed well before the BR deadline.

Please let us know if you have any questions.

In accordance with the timeline provided, we have deployed tooling to leverage our ticketing system to alert of impending deadlines for documentation updates. We improved on our original plan by also having it automatically assign bugs.

The tool works by having an automated agent monitor bugs maintained within our internal tracking system. The agent will automatically assign the bug to the on-call engineer from the team if it is not already being worked on by a designated start date, typically the same start date as the related code change. During an on-call shift, it is the responsibility of the engineer to ensure work in their queue is completed or delegated to an appropriate teammate. Once assigned, an automatic notification is sent to nudge the assignee and notify the team any time there is no activity on the bug.

This will help ensure the bug is worked on and completed in time, and will ensure visibility for the entire team any time there is a delay. In addition, this automation will help us to track our completion time and continually improve.

We will follow up with another update next week once we complete our other action item.

Google Trust Services is monitoring this bug for any additional updates or questions. We will also be posting an update on the remaining action item.

We have completed an internal review of all CABF ballots since SC3, in accordance with the timeline provided.

Our review has resulted in several updates to our CPS. The changes made were not material and would not require an incident report. CPS version 4.4 containing the updates has been published to our repository at https://pki.goog/

Google Trust Services is monitoring this bug for any additional updates or questions.

Google Trust Services is monitoring this bug for any additional updates or questions.

Google Trust Services is monitoring this bug for any additional updates or questions.

Dear Ben,

We have completed all remediation steps, and it appears that there are no concerns raised by the community. We believe that we can move forward by marking the bug as Fixed. Can you please advise? We are making this request by following your guidance at https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/BOwcbWbZTg0/m/ceLXGwgEBwAJ

Thank you

Flags: needinfo?(bwilson)

I'll schedule to close this on next Wed. 20-Oct-2021.

Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.