Closed Bug 1719920 Opened 5 months ago Closed 3 months ago

Amazon Trust Services: Revocation Time for Intermediate Certificates

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: warnckeh, Assigned: warnckeh, NeedInfo)

Details

(Whiteboard: [ca-compliance] [delayed-revocation-ca])

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list (https://groups.google.com/a/mozilla.org/g/dev-security-policy), a Bugzilla bug, or internal self-audit), and the time and date.

Amazon Trust Services had two certificates for intermediates that were not in our audit reports but were in CCADB. This is covered in https://bugzilla.mozilla.org/show_bug.cgi?id=1713668. These certificates were created in October 2015 and were reissued in November 2015, when the original certificates were deleted, but they were not revoked at that time. These two deleted certificates were revoked on June 23, 2021. This report is about whether or not revocation was required sooner. Amazon Trust Services first considered revoking these certificates in Nov 2015, however, we determined it was not necessary to have them added to OneCRL or revoke them until the automated vetting of the audit reports made it clear that one certificate per key pair did not meet Mozilla’s policy.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Oct 21, 2015 - Amazon Trust Services issues intermediates that will be used to create the test certificates for it’s repository.
Nov 27-29, 2015 - Amazon Trust Services corresponds with Google regarding a question related to path building libraries.
Nov 30, 2015 - Amazon Trust Services determines that the dates on the intermediates created on Oct 21, 2015 may create undesirable behavior in certain browsers and decides to correct the dates associated with the key pairs to eliminate the identified path building issues. At this time it is also determined that this doesn’t meet the miss-issuance criteria and that revocation is not necessary.
Dec 3, 2015 - Amazon Trust Services corrects the dates associated with the previously generated key pair. The old certificates are deleted as previously described in https://bugzilla.mozilla.org/show_bug.cgi?id=1713668#c7.
February 2020 - While reviewing options for resolving ALV errors it is determined that these do not need to be revoked as they were not miss-issued.
April 2021 - While reviewing options for resolving ALV errors Amazon Trust Services determines that the easiest path forward for resolving the errors is to revoke the two certificates.
Jun 23, 2021 - Amazon Trust Services revokes the certificates.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

Amazon Trust Services only used these intermediates once on Oct 21, 2015 to issue repository test certificates. They have not been used since.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

https://crt.sh/?sha256=80DD9E3497F354E30B8ACF39D046DD4F5A618F7889236EB34F78D54D15CD6A50
https://crt.sh/?sha256=E39D3ED886E5A3AF26B9D6AB608028BC6FBC52E599CB323DA7E9E775B530337C

5. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

https://crt.sh/?sha256=80DD9E3497F354E30B8ACF39D046DD4F5A618F7889236EB34F78D54D15CD6A50
https://crt.sh/?sha256=E39D3ED886E5A3AF26B9D6AB608028BC6FBC52E599CB323DA7E9E775B530337C

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The intermediates were known to exist but we determined during multiple reviews that they did not need to be revoked. We considered adding them to OneCRL, but determined that since the intermediates were only used once for repository test certificates in Oct 2015, revoking them was the best path forward and would not have any negative impact.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Amazon Trust Services has updated our process template for ceremonies to require revocation of certificates that are deleted from the HSM.

Assignee: bwilson → warnckeh
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [delayed-revocation-ca]

April 2021 - While reviewing options for resolving ALV errors Amazon Trust Services determines that the easiest path forward for resolving the errors is to revoke the two certificates.
Jun 23, 2021 - Amazon Trust Services revokes the certificates.

Can you explain the time gap between decision making and actual revocation?

Bug 1713668, Comment #1 (Dated 2021-05-31) stated Amazon's plan to revoke (by 2021-06-30), and successfully executed that on June 23.

So what I'm trying to square away is:

  • Why the delay from April to May to report the issue?
  • Why the delay from April to June to perform revocation?
Flags: needinfo?(warnckeh)

Why the delay from April to May to report the issue?
On April 6, 2021 Mozilla sent out a draft communication stating that they were going to request dates from CAs to deliver plans to address ALV issues. There was no set time frame or “due date” specified by Mozilla for these plans to be delivered. On April 30, 2021 (as part of the April 2021 responses) ATS provided a date to deliver this plan of May 31, 2021, which was the date we opened the related bug including the incident report. Waiting until May allowed ATS to do a thorough review of internal documents and communications from 2015 to early 2021 to fully understand the decisions made and actions taken around the two deleted intermediates during that period.

Why the delay from April to June to perform revocation?
While ATS successfully performed two in-person ceremonies during the COVID pandemic using appropriate precautions, this delay from April to June allowed the team for this specific ceremony to be fully vaccinated, which was the safest way for this revocation to be completed.

Amazon Trust Services continues to monitor this issue for any further comments or questions.

Flags: needinfo?(warnckeh)

Amazon Trust Services continues to monitor this issue for any further comments or questions.

Amazon Trust Services continues to monitor this issue for any further comments or questions.

(In reply to warnckeh from comment #2)

On April 30, 2021 (as part of the April 2021 responses) ATS provided a date to deliver this plan of May 31, 2021, which was the date we opened the related bug including the incident report. Waiting until May allowed ATS to do a thorough review of internal documents and communications from 2015 to early 2021 to fully understand the decisions made and actions taken around the two deleted intermediates during that period.

Can you be more specific and precise about this? I think it's still quite concerning a delay, and I'm not sure "We delayed because no one said we couldn't" (which is what this reply feels to be) is particularly reassuring. It's also unclear why it would take nearly a month to perform that.

In case it's not clear: It feels very much that Amazon does not take compliance or incident reporting seriously. Certainly, the number of incidents that have resulted from manual ceremonies is increasingly disconcerting, and it suggests an approach to controls that doesn't understand, or doesn't agree with, the seriousness required of a Root CA. The opportunity here is for ATS to provide greater transparency about precise details, to help reassure the community that, in fact, they do take things seriously. Or, as is both possible and unfortunately likely, it could just be that ATS doesn't treat all CA incidents as urgent and high priority matters, and that raises concerns about whether and if ATS will be able to do so in the future.

Why the delay from April to June to perform revocation?
While ATS successfully performed two in-person ceremonies during the COVID pandemic using appropriate precautions, this delay from April to June allowed the team for this specific ceremony to be fully vaccinated, which was the safest way for this revocation to be completed.

Why not disclose this then? Bug 1613668 shows concerns with the nature of Amazon's responses. Is there a plan to improve here?

Flags: needinfo?(warnckeh)

Addressing the request to more fully detail activities in the month of our investigation (April-May), the staffing of ATS has changed fully since 2015, which was when the certs in question were originally issued (and quickly re-issued). While the dates of issuance and re-issuance are easily found in our internal records, the question we faced in 2021 was why they were not revoked at the time of re-issuance in 2015, what steps were required by ATS at the time, and on what timeline. Answering those questions required us to review materials from people no longer in ATS.

We first evaluated if the certificates were required to be revoked during a specific time frame. That evaluation was done in both January 2020 and again in April 2021. During the previously referenced (https://bugzilla.mozilla.org/show_bug.cgi?id=1713668) January 2020 investigation, we determined that these certificates were duplicates, and there wasn’t anything inherently non-compliant about them that required them to be revoked within in a specific time frame. The first action taken in April 2021 was to verify that these certificates were the same ones we had previously determined didn’t require immediate revocation. Once we determined that this didn’t fall in to the seven day revocation requirement, we wanted to determine why previous team members hadn’t revoked the certificates in the past. ATS staff has fully rotated since 2015 so it took some time to connect with previous team members and find notes on this subject that explained the decisions made in the past. Our findings from this investigation were that the team had intended to revoke the certificates in 2016 but later decided it was not necessary based on their best understanding of CA/B requirements at the time. Once we understood this, we decided that the best course of action was revocation at the soonest safe date. We made this incident report not because we thought it was a delayed revocation incident, but to provide transparency about the timelines and actions related to these intermediates.

We’ve made several changes to our processes over the years to better handle these types of situations. Following our previous bugs (https://bugzilla.mozilla.org/show_bug.cgi?id=1569266 and https://bugzilla.mozilla.org/show_bug.cgi?id=1525710) in addition to the remediations discussed in that bug we also implemented a formal change review process where we assess changes to CA/B Forum requirements and root store programs. Last summer as part of our review of the browser alignment ballot we expanded this review process to more stakeholders. This year as described in this bug https://bugzilla.mozilla.org/show_bug.cgi?id=1713668 we’ve also added CCADB changes to this.

Bug 1613668, as linked, is not an ATS bug, but we understand it was in reference to https://bugzilla.mozilla.org/show_bug.cgi?id=1713668, which is a closed bug around ATS ALV errors, which was related to the above response.

Thanks Trev. I appreciate the extra details. I hope you can understand that comments such as Comment #6 reflect the reality that, when dealing with CA incident reports, it's not possible to "assume good intent", as that is something that is subjectively individual, and instead have to work on the information provided.

The information provided in Comment #7 seems to be revealing a disconnect, which persisted at least through the January 2020 investigation, and which was mentioned in Comment #0, reproduced below:

This report is about whether or not revocation was required sooner. Amazon Trust Services first considered revoking these certificates in Nov 2015, however, we determined it was not necessary to have them added to OneCRL or revoke them until the automated vetting of the audit reports made it clear that one certificate per key pair did not meet Mozilla’s policy.

In Mozilla Policy 2.4.1 (Published 2017-03-31), we see a hopefully unambiguous scope in Section 1.1, as including unexpired, unrevoked intermediate certificates. It incorporates, by reference, the CCADB Policy 2.4.1, which similarly defines a scope in Section 4. Section 5 then details the requirements, namely:

The entry for each intermediate certificate has "Audits Same as Parent" and "CP/CPS Same as Parent" checkboxes. When those are checked, the details do not need to be duplicated from the parent cert. However, the intermediate certificate must be specifically listed in the audit statements of the parent certificate.

So, at least by 2017, it should have been clear to Amazon that, at the minimum, they were in non-compliance with Mozilla Policy, and that the only acceptable remediation path towards compliance was the auditing of that intermediate or revocation.

In 2019, Kathleen Wilson of Mozilla again reiterated this expectation, and made it clear the remediation path for CAs that are in non-compliance, including the expectation of revocation. Included in that thread is an analysis about the revocation requirement that exists within the BRs, including the requirement to revoke within 7 days.

Thus, it stands that there are a few concerns here with this bug:

  • The failure, in 2017, to review these changes and the implications it had on the 2015 certificates.
    • Note: This was not a new requirement; Policy 2.3 (Effective 2013-07-26) already had similar language, noting:

      All certificates that are capable of being used to issue new certificates, and which directly or transitively chain to a certificate included in Mozilla’s CA Certificate Program, MUST be operated in accordance with Mozilla’s CA Certificate Policy and MUST either be technically constrained or be publicly disclosed and audited.

      The language introduced in 2.4.1 was about clarifying the appropriate demonstration of that (i.e. by listing in the audit)

  • The failure, in October 2019, to recognize these expectations.
  • The failure, in January 2020, to recognize the past clarifications as applicable to this case, and as addressed in other CA incident bugs.
  • The failure, in April 2021, to recognize the past clarifications as applicable to this case, and as addressed in other CA incident bugs.

Effectively, the revocation was always required, and so it seems useful to use this incident to understand why it wasn't detected until now, and look to better understand the changes that are being made to follow these discussions and clarifications, as well as to seek feedback for situations to help confirm ATS' understanding is correct. I acknowledge that Bug 1713668, Comment #4 addresses this last point, and to some extent, is emphasizing the points made in Bug 1713668, Comment #6.

While Bug 1713668, Comment #11 reflects my dissatisfaction at the resolution, I'm still quite concerned by comments like Comment #7 seem to not have internalized that there was a serious control failure in terms of following root program expectations, nor does it appear that there's been a meaningful remediation of processes to prevent similar situations in the future. At core concern here is the decision-making process employed by ATS, and Bug 1713668, Comment #10 did not really provide much assurance that things have changed from the 2015 review, or the subsequent 2020 review, and seems to persist to this day, as highlighted by Comment #1.

Ryan,

ATS takes this issue very seriously. We recognize the importance of having sound certificate authorities and the impact they have on the internet. The original failure was due to a single individual making a judgement call. Our focus in our policy changes has been to prevent ever being in a similar situation again.

We have changed our policies such that any certificate created during a manual ceremony that will not be used, will always be revoked. Further, we’ve changed to a two person control on policy interpretation. When looking at policies, we will always ensure they’re reviewed by two people.

Specifically addressing Oct 2019. While we did immediately check our ALV error list because our audit letter didn’t meet the formatting requirements, we didn’t catch that the number errors didn’t match the number of intermediates listed until we were answering the Mozilla Communications in Jan 2020. While we reviewed the requirements for what actions to take as discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1713668 this was a miss on our part that we didn’t followup in Jan 2020.

If there are no further questions we would like to request that this bug be Resolved as Fixed.

Flags: needinfo?(bwilson)

I'll slate this for closure on Friday, 3-Sept-2021.

Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.