Closed Bug 1718991 Opened 3 years ago Closed 3 years ago

Microsoft PKI Services, Malformed ICAs (Key Usage Malformed)

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: johnmas, Assigned: johnmas)

Details

(Whiteboard: [ca-compliance] [ca-misissuance])

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36 Edg/91.0.864.48

Type: defect → task

Incident Report

  1. How your CA first became aware of the problem.

Microsoft PKI Services has identified four (4) Intermediate CA’s that have been mis-issued because they have malformed Key Usage extensions. We became aware of this issue on 24 June 2021 01:10 PM (Pacific Time) when the team manually inspected the just created certificates during the live certificate generation process.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Note: Times are listed in the Pacific time zone.
• 13 May 2021 03:59 PM – Bugzilla Bug reporting (8) Malformed ICA’s (missing certificate policy extensions) created (1711147 - Microsoft PKI Services: Malformed ICAs (missing certificate policy extensions) (mozilla.org))
• 22 June 2021 – Live templates created in our internal tools.
• 24 June 2021 01:03 PM – Four (4) Intermediate CA certificates issued from our Microsoft RSA Root Certificate Authority 2017. This Root certificate is in an Offline environment segregated from any networks.
• 24 June 2021 -1:08 PM – Team identified during manual inspection of the certificates that the Key Usage field was malformed.
• 24 June 2021 01:18 PM – Revoked four (4) mis-issued ICA certificates from 24 June 2021.
• 24 June 2021 01:45 PM – Identified issue in template and repaired template.
• 24 June 2021 1:57 PM – Issued four (4) correctly formed ICA certificates from the RSA root.
• 24 June 2021 05:17 PM – Published updated CRL to revoke 4 mis-issued ICA certificates from 24 June 2021. http://www.microsoft.com/pkiops/crl/microsoft rsa root certificate authority 2017.crl

  1. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

We stopped issuance via our offline CA systems and processes, as this is where this issue arose. We have now identified the root cause as being related to the way in which our internal software tools configure templates in our internal tools. Once the issue was fixed on the RSA root we resumed offline issuance, nevertheless we have stopped the creation of new templates until all remediations are completed.

  1. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

There are four (4) certificates that we have created with this issue. All four (4) have now been revoked. All four (4) certificates were created on 24 June 2021. All four (4) of these CAs were revoked within minutes of creation and had not yet been used to issue subscriber certificates. No other ICA certificates with this issue have been issued since.

All four (4) ICA certificates have been attached to this bug.

  1. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

See attachments for each of the four (4) certificates.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
    Upon seeing the malformed key usage in the issued certificates and revoking them immediately, the team began the investigation into root cause.

The root cause of this mis-issuance was related to the new certificate templates that the team instituted on this Root CA. These templates are configured using our own internal software and WebUI. We decided to add these templates as a defense in depth measure to remediate another ICA mis-issuance bug (1711147 - Microsoft PKI Services: Malformed ICAs (missing certificate policy extensions) (mozilla.org) that Microsoft PKI Services currently has open. We installed and tested the templates first on our test roots (one RSA and one ECC) and then on our live root certificates (one RSA and one ECC).

Our internal software and UI for creating these templates requires each template to be individually configured for each CA. So, the team had a ceremony and followed it to create the template on each of the four CAs described above (test and live, RSA and ECC roots). The configuration and setup of these templates went well on 3 of the 4 CAs but had an issue specific to the Key Usage on the RSA live root. The UI for our internal software was used during the configuration process to ensure the templates were configured identically and the UI indicated they were all configured identically.

When the team realized that there was a problem with the certificates issued from the live RSA root, they were able to do some further digging into our internal software’s database to identify that the Key Usage blob on that CA was malformed, specifically instead of the encrypted base 64 value AwIBhg== it was listed as "1". The team is not sure how the malformation happened to the Key Usage field in the template, but suspect it was a mistake when typing in the configuration of the template.

Once they identified the problem with the template, they were able to fix the configuration and successfully issue four (4) live Intermediate CA’s with the updated template.

We also want to address a very similar bug (1711147 - Microsoft PKI Services: Malformed ICAs (missing certificate policy extensions) (mozilla.org)) that Microsoft PKI Services currently has open. This mis-issuance is a different root cause, as discussed above. We would like to point out that the improvements that we have added to this process because of the recent bug helped during this issue. The manual checks that were added helped to immediately identify that these certificates had a problem, and we were able to fail fast.

  1. List of steps your CA is taking to resolve the situation and ensure that such a situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Completed Remediation's:
• All four (4) certificates were revoked within minutes of their creation and the CRL was published a few hours later (24 June 2021).
• Within hours of the issue the team identified the problem with the template that was created on the live RSA Root CA and fixed the issue. The team was able to successfully issue the live certificates once the template was corrected (24 June 2021).

Open Remediation's:
• We will update our Template creation process to compensate for the shortcomings of our internal template tools as identified in this bug. We will add steps to further interrogate the database in our internal tools to check the configuration of the template before it is used. We will not configure any new templates until this process is updated. (21 July 2021).

Thanks John.

In particular, I appreciate that this incident report made sure to call out relevant historic details, as well as the relationship to, and differences from, other Microsoft CA incidents, such as Bug 1711147

That said, I think you've hit on another pattern which we've seen come up in other CA incidents, namely: CA configuration that relies on human factors (such as UIs) can lead to issues, either in following the same procedure (e.g. typos) or in correctly propagating configuration (e.g. situations where if you get different results if A is clicked before B than if B is clicked, then A).

For example, Bug 1707073 was about linter configuration, while Bug 1676352 looked into certificate profile management/configuration.

To that end, to what extent do Microsoft tools require configuration via UI, rather than via a deterministic and automatically repeatable process (such as source control or scripted automation), and is the 21 July work to fully replace all things depending on human UI, or just a portion?

Flags: needinfo?(johnmas)
Assignee: bwilson → johnmas
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

I checked with a Mozilla engineer, who said: "Firefox rejects those CAs with SEC_ERROR_INADEQUATE_KEY_USAGE, which is not overridable, so I don't see much of a benefit to putting them in OneCRL."

Therefore, I would like to not add these certificates to the CCADB.

Please let me know if that will cause problems for anyone.

Thanks,
Kathleen

Ryan, our Offline CA issuance process has a mix of configuration via UI or other human factors and more deterministic and automatically repeatable processes. Overall, we have more deterministic processes for about 90% of the steps and human factor processes for the remaining 10%. In addition, our security and process requirements require at least three people to execute these processes and ensure that each step in the process is done per documentation.

Creating templates is not a part of our normal Offline CA issuance process but is used when setting up a new template or changing an existing template. Now that the templates are correctly configured on the two live Public TLS Roots (RSA and ECC) that we manage, this will further reduce the human factors and enforce repeatable processes.

The remediation step for 21 July is focused primarily on human factor controls during template creation/update and will add defense in depth to prevent a similar issue from repeating itself.

Kathleen – thank for your help in sorting through the CCADB issues for these 4 certificates.

Flags: needinfo?(johnmas)

We are still on track for delivering the remaining open remediation next week.

Open Remediation(s):
• We will update our Template creation process to compensate for the shortcomings of our internal template tools as identified in this bug. We will add steps to further interrogate the database in our internal tools to check the configuration of the template before it is used. We will not configure any new templates until this process is updated. (21 July 2021).

John Mason is OOF this week, so I am providing an update: We have completed updating our Template creation process as planned.

We have completed all planned remediation's for this issue and believe our Template creation process is much improved. We ask that this bug be resolved at this time.

I will schedule this to be closed next Wed. 7-Aug-2021.

Flags: needinfo?(bwilson)

I am standing in for John, while he is on vacation this week. Thanks for the update Ben.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ca-misissuance]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: