Closed Bug 1770510 Opened 2 years ago Closed 2 years ago

Google Trust Services: Failure to provide preliminary report within 24h

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cadecairns, Assigned: cadecairns)

Details

(Whiteboard: [ca-compliance] [disclosure-failure])

On 2022-05-20, Google Trust Services was reviewing previous Certificate Problem Report responses and discovered a potential issue with the tool we use for handling emails. We identified that a response we sent containing the preliminary findings of our investigation may not have been received by the subscriber. The investigation of this certificate problem report ultimately did not lead to a revocation, but we need to look through logs managed by a third party to determine if other recent reports may have encountered a similar issue.

We are conducting a full technical investigation. A full report will be posted within the next seven days.

Apologies, late Friday afternoon mistake on the title.

Summary: Google Trust Services: Failure to provide preliminary incident report within 24h → Google Trust Services: Failure to provide preliminary report within 24h
Assignee: bwilson → cadecairns
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

1. How your CA first became aware of the problem

A Google Trust Services CA Engineer noticed that the preliminary report with our findings in response to a Certificate Problem Report (CPR) may not have been sent when intended.

2. A timeline of the actions your CA took in response.

YYYY-MM-DD (UTC) Description
2022-05-06 06:47 Certificate issued.
2022-05-06 15:31 A request to revoke the certificate was received from a third-party reporter using the contact form at https://pki.goog/faq/.
2022-05-06 17:25 CA Engineer 1 responds to the reporter with the outcome of our initial investigation, via our case management tool.
2022-05-12 16:37 While performing standard monitoring of open correspondences for replies, CA Engineer 2 discovers that the message may have not been sent correctly and resends the message. Details of the uncertainty are explained later in the incident report.
2022-05-18 13:00 Based on preliminary investigations using data immediately available to GTS, a determination is made that the reporter and/or subscriber may not have been notified. GTS starts preparing for an incident while GTS CA Engineers try accessing logs from our third-party case management tool to confirm the suspected mis-delivery and determine whether the subscriber was notified.
2022-05-20 16:21 After failing to conclusively prove what happened via logs GTS had access to, an incident is declared in anticipation of logs from the case management tool team confirming GTS’ concerns.
2022-05-20 20:18 GTS contacted the third-party case management tool team to assist in root cause analysis.
2022-05-20 20:38 Mozilla preliminary incident report is filed in this bug. GTS continues evidence gathering and further investigation.
2022-05-25 18:05 While reviewing the logs and preparing this incident report, CA Engineers confirmed that the original response omitted the Subscriber. The Subscriber was notified of the CPR retroactively.
2022-05-26 11:00 GTS modifies their procedures for responding to requests, and begins to investigate more comprehensive mitigations.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.

Google Trust Services has not stopped issuing certificates as this incident did not produce misissued certificates. We made a configuration change in our case management tool and revised the process we follow when handling CPRs to reduce the likelihood of this happening again while we seek a longer-term solution.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.)

Google Trust Services did not find any issues with the certificate mentioned in the CPR, nor is it relevant to this particular incident.

5. In a case involving certificates, the complete certificate data for the problematic certificates.

N/A.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

Google Trust Services uses a standardized pipeline to handle correspondence from users. Incoming messages from a request form at https://pki.goog/faq and our public email address are processed by a third party case management system for triaging. A team separate from GTS performs an initial triage of these requests to determine their legitimacy, then hands legitimate ones off to GTS for follow up action. Given the large volume of messages Google receives, this is the standard setup for triage of messages sent to public addresses.

We have been making our services available to more users and partners. Until recently, we averaged a lower number of Certificate Problem Reports each month. The recent increase in the volume of inbound messages revealed several issues with our case management software.

One major contributor to this increase in CPRs is that a large, third party partner has begun obtaining more GTS certificates as part of a new offering. This paired with increasing use of CT monitoring tools, surprised some users when they were notified of new certificates provisioned on their behalf.

When handling the Certificate Problem Report in question, our pipeline worked as intended and a CA Engineer began an investigation within two hours, concluding that the request did not require revocation. In this case, the Subscriber was the third party partner and they were not successfully included in the correspondence with our preliminary report due to a process breakdown made when using the case management tool.

Investigation of this issue was hindered due to the fact that the recipient field was inadvertently changed in our case management software when the message was moved between queues during its initial triage. This occurred when a CA Engineer responded to the request before it could go through its usual initial triage and was the source of the uncertainty referenced in the timeline.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

GTS has made changes to no longer use the case management tool to respond to correspondence from users. The GTS team will instead directly use the Google Groups UI to respond to any requests. In the short term, GTS will continue to use the case management tool for the monitoring, alerting, and triaging of reports.

In addition, we have amended our procedures to ensure all required parties are sent preliminary investigation emails in future. Technical controls were also introduced to prevent the triaging team from modifying the recipient field in the case management tool.

We have also added an explicit reminder in the relevant procedure to confirm the Subscriber is included in any communications about the outcome of an investigation of a CPR.

We are conducting an analysis of older communications, with the assistance of the case management tool team, to determine whether this problem arose in the past. We will share the outcome of our investigation by next update 2022-06-03.

We worked with the case management tool’s maintainers to adjust the configuration and behavior of our CPR handling pipeline to ensure that we avoid similar issues in the future. We are also working towards migrating to a new solution, which had already been planned, but has not yet been finalized. We will provide an update on this migration plan by the next update on or before 2022-06-03.

Google Trust Services has concluded an analysis of older communications from the past 18 months, as promised in Comment 2. We have determined that within this period, there were an additional four CPRs for which the subscriber was not successfully included in the correspondence with our preliminary report. All four cases occurred due to the same process breakdown made when using the case management tool that we described in the aforementioned comment. All four cases were also issued under similar circumstances where the reporter didn’t realize that a third party service provider requested issuance on their behalf.

We have begun design of a new solution that will provide a guided process for subscribers and other entities to submit CPRs. The primary goals of the solution are to reduce the cycle time to process a CPR by gathering relevant details up front, reduce the need for initial triage by eliminating the risk of spam, and ensure the correct parties are communicated with. We are investigating existing support solutions to satisfy our requirements. We plan to have a solution implemented by the end of 2022. However, the process and tool changes we described in our earlier comment will provide effective mitigation of the risk of this issue occurring again until the new, better solution is implemented.

Google Trust Services is continuing to monitor this bug for comments or questions.

As stated in our previous comment, we have made changes to effectively mitigate this issue until the new, better solution is implemented. If there are no comments or questions, we request consideration to close this bug.

Flags: needinfo?(bwilson)

I will close this on Wed. 15-June-2022.

One major contributor to this increase in CPRs is that a large, third party partner has begun obtaining more GTS certificates as part of a new offering. This paired with increasing use of CT monitoring tools, surprised some users when they were notified of new certificates provisioned on their behalf.

I'm confused as to why this is happening. Doesn't Google's subscriber agreement forbid a third-party from obtaining certificates in this manner? From "Use and Restrictions":

Subscriber will... (c) not use a Google-issued Certificate for or on behalf of any other person, organization, or
entity

If this portion of the agreement is a mistake, did Google ever notify the actual domain owners, rather than just the subscriber? It sounds like the subscriber might not actually be the only entity who needs to be notified for any given certificate.

Google Trust Service is preparing a response to the recently-added question. We are continuing to monitor this bug for comments or questions.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED

Any chance Google has an update? It looks like this was marked as resolved without an answer.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Google permits the use of Google-issued Certificates by authorized parties. This is a common practice as many hosting and content providers manage certificates as part of service offerings on behalf of their customers. As per BRs 3.2.2.4 the subscriber can only obtain a certificate on behalf of a third party where it has been authorized by the domain owner. Where there is such an engagement between the subscriber and domain owner, the subscriber would be in communication with the domain owner and be able to notify such domain owner accordingly.

Flags: needinfo?(bwilson)

I'll close this on Friday, 24-June-2022.

Indeed, that is a common practice, but it conflicts with your subscriber agreement. Could you update your subscriber agreement to match? Unless I'm misunderstanding, your subscriber agreement explicitly forbids this.

GTS has ongoing discussions with our legal counsel to improve GTS’ contractual documents, including the Subscriber Agreement. Counsel has reviewed the current version and found no issues with this particular use case. As part of our continuous review program we have prioritized our next review of the Subscriber Agreement to consider changes for clarity.

Flags: needinfo?(bwilson)

Thanks for the response, but I'm still a little confused. Since you've determined this use case doesn't violate the restriction in question, could you provide an example of a use case that does violate that clause? I understand you're working to improve the agreement, which I appreciate, but an informal clarification would be helpful during the interim.

Flags: needinfo?(cadecairns)

I'm still inclined to close this, and will do so on 8-July-2022. We seem to be going off-track here, because I'm unclear on the relevancy of the question in Comment #13 re: "a use case that does violate the clause". It seems to be rather open-ended and not typically the kind of question for follow-up on an incident report. Maybe it can be clarified and tied back into the overall subject of the incident report, which was the delayed response to certificate problem reports? Thanks.

Flags: needinfo?(dhelion)

Google Trust Services is continuing to monitor this bug for any further comments or questions.

Flags: needinfo?(cadecairns)
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [disclosure-failure]
Flags: needinfo?(dhelion)
You need to log in before you can comment on or make changes to this bug.