Closed Bug 1713668 Opened 3 years ago Closed 3 years ago

Amazon Trust Services: ALV Errors

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: trevolip, Assigned: trevolip)

Details

(Whiteboard: [ca-compliance] [audit-failure])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0

Steps to reproduce:

Amazon Trust Services has two unused certificates for intermediates in Mozilla’s root program that aren’t on our audit reports. This causes the Audit Letter Validation (ALV) check to fail.

This compliance report isn’t about the contents or creation of the intermediate certificates but about ALV errors. However, here is background on the certificates for context:

Amazon Trust Services (we) made these intermediate certificates in Oct 2015. During a review of these certificates in Nov 2015 we identified two issues with the certificates and decided to reissue corrected versions. The validity period of the certificates was past that of the root and the caIssuers field contained “crl” instead of the intended “crt”. We added checks into our system to prevent issuing certificates with validity periods past the date of the root and corrected the template for caIssuers field in Nov 2015. In Dec 2015 we deleted these certificates from our HSMs and issued updated versions.

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list (https://groups.google.com/a/mozilla.org/g/dev-security-policy), a Bugzilla bug, or internal self-audit), and the time and date.

Amazon Trust Services became aware of the issue when reviewing the Jan 2020 CA Communications. During our review of the list in CCADB vs audit report list, we identified that our reports were missing some certificates that are in CCADB.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

January 30, 2020 – Amazon Trust Services corresponds with Mozilla for clarity on resolving formatting errors during ALV.
January 30, 2020 – Amazon Trust Services contacts our audit team about the ALV failures.
January 31, 2020 – Amazon Trust Services identifies that there is a mis-match with the number of ALV errors vs the number of certificates in CCADB and begins to investigate the mismatched certificates. We also began the discussion with our auditors about appropriate remediation.
February 12, 2020 – Amazon Trust Services determines with our auditors that it’s not appropriate to add these certificates to the audit reports.
April 13, 2021 – Amazon Trust Services updates our audit reports in CCADB and makes note that there are existing ALV errors that should be discussed at the next CA/B Forum and Trust Store policy meeting on April 30, 2021.
April 30, 2021 – Amazon Trust Services finalizes answers for April 2021 CA Communications, including a plan for resolving ALV issues.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

Amazon Trust Services deleted these certificates and stopped using them in Dec 2015.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

Amazon Trust Services had two certificates issued without being listed in its audit reports:
https://crt.sh/?sha256=80DD9E3497F354E30B8ACF39D046DD4F5A618F7889236EB34F78D54D15CD6A50
https://crt.sh/?sha256=E39D3ED886E5A3AF26B9D6AB608028BC6FBC52E599CB323DA7E9E775B530337C

5. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

https://crt.sh/?sha256=80DD9E3497F354E30B8ACF39D046DD4F5A618F7889236EB34F78D54D15CD6A50
https://crt.sh/?sha256=E39D3ED886E5A3AF26B9D6AB608028BC6FBC52E599CB323DA7E9E775B530337C

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We should have opened a ticket for this with Mozilla to get clarity in February 2020 following the Mozilla policy change requiring all intermediate certificates in Mozilla’s root program be listed in every CA’s annual audit report.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

We will revoke the two intermediate certificates at issue by June 30, 2021.

We have a monthly review meeting where we review CA/B Forum and Trust Store policy changes. We have added reviewing the CCADB Summary page and the crt.sh Mozilla Disclosures to this meeting.

Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] Next update 2021-07-01

Listing intermediates in audit reports is a long-standing requirement of the Mozilla program. Mozilla Root Store Policy version 2.5, which has been in effect since June 23, 2017, states in section 3.1.4:

The publicly-available documentation relating to each audit MUST contain at least the following clearly-labelled information
[...]
3. Distinguished Name and SHA256 fingerprint of each root and intermediate certificate that was in scope;

Therefore, the timeline in Comment 0 is incomplete. Furthermore, the answer for question 6 is inadequate as it doesn't explain how Amazon Trust Services was unaware of this clearly-stated part of the MRSP. Combined with Bug 1713976 and Bug 1713978, it seems that there is a deeper problem at Amazon Trust Services with awareness of policy requirements. Amazon Trust Services should redo this incident report and provide much greater detail about their process for staying up-to-date with policy requirements, how the process failed to prevent this incident, and what Amazon Trust Services is doing to improve their awareness of policy. For example, if Amazon Trust Services regularly reviewed other CAs' incidents, as several other CAs now do, they may have been aware from Bug 1455150 Comment 8 that listing fingerprints of unused intermediates is required.

Flags: needinfo?(trevolip)

Amazon Trust Services acknowledges this feedback and we are investigating.

Summary: Amazon Trust Services - ALV Errors → Amazon Trust Services: ALV Errors

Amazon Trust Services acknowledges this feedback. We will report on our findings and next steps in our next update.

Flags: needinfo?(trevolip)
Assignee: bwilson → trevolip

Amazon Trust Services has two unused certificates for intermediates in Mozilla’s root program that aren’t on our audit reports. This causes the Audit Letter Validation (ALV) check to fail.
This compliance report isn’t about the contents or creation of the intermediate certificates but about ALV errors. However, here is background on the certificates for context:
Amazon Trust Services (we) made these intermediate certificates in Oct 2015. During a review of these certificates in Nov 2015 we identified two issues with the certificates and decided to reissue corrected versions. The validity period of the certificates was past that of the root and the caIssuers field contained “crl” instead of the intended “crt”. We added checks into our system to prevent issuing certificates with validity periods past the date of the root and corrected the template for caIssuers field in Nov 2015. In Dec 2015 we deleted these certificates from our HSMs and issued updated versions.

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list (https://groups.google.com/a/mozilla.org/g/dev-security-policy), a Bugzilla bug, or internal self-audit), and the time and date.

Amazon Trust Services became aware of the issue when reviewing the Jan 2020 CA Communications. During our review of the list in CCADB vs audit report list, we identified that our reports were missing some certificates that are in CCADB.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

October 2015 - Created intermediate certificates https://crt.sh/?id=18068141 and https://crt.sh/?id=12624945. (https://crt.sh/?id=12624945)
December 2015 - Deleted intermediate certificates https://crt.sh/?id=18068141 and https://crt.sh/?id=12624945 and issued corrected versions https://crt.sh/?id=11265963 and https://crt.sh/?id=11214585. (https://crt.sh/?id=11214585)
November 2016 - April 2017 - Ongoing Mozilla discussion of including roots and intermediates in audit reports.
March 28, 2017 - As ATS was following the above Mozilla discussion regarding inclusion of intermediates, publication of the first ATS audit report including the two corrected intermediates (and three other intermediates). The two deleted intermediates were not included, based on the Mozilla discussion at the time, from which the team determined deleted certificates were not in scope for the audit report because the corrected certificates shared the same key pair with the deleted ones (which ATS understood to be the criteria for inclusion at that time).
March 2018 - New audit reports covering 2017 - 2018 created with existing list based on previous ATS decision.
March 2019 - New audit reports covering 2018 - 2019 created with existing list based on previous ATS decision.
January 7, 2020 - Mozilla releases “January 2020 CA Communication” for CAs to respond to by January 31, 2020.
January 30, 2020 – As part of ATS’ response to the “January 2020 CA Communication” Amazon Trust Services reaches out to Mozilla for clarity on resolving formatting errors during ALV.
January 30, 2020 – Amazon Trust Services contacts our audit team about the ALV failures.
January 31, 2020 – Amazon Trust Services identifies that there is a mis-match with the number of ALV errors vs the number of certificates in CCADB and begins to investigate the mismatched certificates. We also began the discussion with our auditors about appropriate remediation.
February 12, 2020 – Amazon Trust Services determines with our auditors that it’s not appropriate to add these certificates to the audit reports.
April 13, 2021 – Amazon Trust Services updates our audit reports in CCADB and makes note that there are existing ALV errors that should be discussed at the next CA/B Forum and Trust Store policy meeting on April 30, 2021.
April 30, 2021 – Amazon Trust Services finalizes answers for April 2021 CA Communications, including a plan for resolving ALV issues.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

Amazon Trust Services deleted these certificates and stopped using them in Dec 2015.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

Amazon Trust Services had two certificates issued without being listed in its audit reports:
https://crt.sh/?sha256=80DD9E3497F354E30B8ACF39D046DD4F5A618F7889236EB34F78D54D15CD6A50
https://crt.sh/?sha256=E39D3ED886E5A3AF26B9D6AB608028BC6FBC52E599CB323DA7E9E775B530337C

5. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

https://crt.sh/?sha256=80DD9E3497F354E30B8ACF39D046DD4F5A618F7889236EB34F78D54D15CD6A50
https://crt.sh/?sha256=E39D3ED886E5A3AF26B9D6AB608028BC6FBC52E599CB323DA7E9E775B530337C

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We should have opened a ticket for this with Mozilla to get clarity in February 2020 following the Mozilla policy change requiring all intermediate certificates in Mozilla’s root program be listed in every CA’s annual audit report.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

We will revoke the two intermediate certificates at issue by June 30, 2021.

As of June 10, 2021, ATS has added a weekly team meeting to review and triage bugs, discussions, and updates that require action or acknowledgement by us. This meeting feeds into the existing monthly ATS leadership discussion where we review and prioritize CA/B Forum and Trust Store policy changes. We have also added reviewing the CCADB Summary page and the crt.sh Mozilla Disclosures to the monthly meeting agenda.

Amazon Trust Services has completed revocation of these intermediates. If there are no further questions we would like to request that this bug be Resolved as Fixed.

Flags: needinfo?(bwilson)

Like Andrew, I still am struggling with this incident report and understanding how ATS missed expectations here.

  • Why wasn't there a revocation event in 2015, when these were deleted from HSMs?
  • What exactly does "deleted from HSMs" mean in this case, given the related discussion of issuing "corrected" versions. It would seem that this simply refers to the public certificates, not the private keys - is that correct?
  • As Andrew pointed out in Comment #1, there were clear policy expectations set. I don't see a clear reference here to what ATS refers to as the contemporary discussion. There was quite a bit of activity in those early periods of 2017 related to undisclosed intermediates, including reporting on crt.sh, so it'd be useful to understand if there is a thread ATS felt was ambiguous?
  • ATS references the January 2020 CA communication, but this was also part of previous CA communications, such as April 2017 and November 2017

ATS also mentions it changed its practice on 2021-06-10, but I don't understand how ATS also missed related discussions.

What I'm trying to understand is what other changes or discussions in the past four years has ATS missed, and how does the plan mentioned in Comment #4 meaningfully provide the assurance that this has been addressed? In particular, I'm concerned that the proposed remediation focuses on this specific instance, without touching the underlying fact that it seems a number of discussions, clarifications, and expectations were missed - then and now - and why it took Mozilla enabling ALV for intermediates to reveal this to ATS.

For comparison, consider Bug 1715455, Comment #27, which seems a similar situation of a CA overlooking relevant bugs and clarifications, and working on a plan to holistically address this.

Flags: needinfo?(bwilson) → needinfo?(trevolip)

Thank you Ryan,

First, we are going to create an incident report for not revoking in 2015. This will be posted next week.

Second, regarding “deleted from the HSMs,” your interpretation is correct. We deleted the incorrect public certificates that had been created. We made no changes to the private key.

Third, we didn’t miss the discussions in 2017. The determination at the time was that since there was already one certificate for the key pair in the audit document, that that met the scope. We now understand this prior understanding to be incorrect, and is not what we would do today.

Flags: needinfo?(trevolip)

Amazon Trust services has created an incident report for not revoking the two intermediates in 2015. This report can be found here: https://bugzilla.mozilla.org/show_bug.cgi?id=1719920

(In reply to Trevoli (Amazon Trust Services) from comment #7)

Third, we didn’t miss the discussions in 2017. The determination at the time was that since there was already one certificate for the key pair in the audit document, that that met the scope. We now understand this prior understanding to be incorrect, and is not what we would do today.

Have there been any process changes to prevent similar incidents in the future?

The problem with taking these answers at face value is that they don't provide any insight into whether and how to prevent this in the future. Amazon made a mistake, it now knows better - but what's the process to prevent similar mistakes in the future? This is what Comment #6 was trying to highlight as a concern, but Comment #7 doesn't really give us any path forward. I think we can reasonably assume this interpretation, but what about the next interpretation? How do we close off this class of bugs?

Flags: needinfo?(trevolip)

In addition to the above-described mechanisms (original Incident Report, Answer #7) that Amazon Trust Services has implemented to improve discovery and tracking of all changes to policy or other requests or decisions that we may need to address, ATS has also improved internal visibility into these changes—especially those that might be considered judgement calls—so that a larger group of ATS business, engineering, and compliance leaders now formally evaluate them to agree on actions/next steps for each change.

I'm not entirely sure Comment #10 really meets the expectations of https://wiki.mozilla.org/CA/Responding_To_An_Incident#Incident_Report (specifically, the last two paragraphs). That is, it feels a bit like "root cause: internal visibility wasn't sufficient for matter of interpretation. solution: internal visibility improved". It doesn't give insight into how that can be reproduced by other CAs, what the considerations are, and how the risks are managed.

That said, I'm sending this to Ben for consideration in closing. I do think this incident report struggles to meet the minimum expectations, but I don't know that keeping it open would lead to a more productive outcome.

Flags: needinfo?(trevolip) → needinfo?(bwilson)

Amazon Trust Services continues to monitor this issue for any further comments or questions.

I'll schedule Friday, 30-July-2021, to close this bug, with the hope that we can improve and not have similar occurrences in the future.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] Next update 2021-07-01 → [ca-compliance] [audit-failure]
You need to log in before you can comment on or make changes to this bug.