Closed Bug 1654967 Opened 10 months ago Closed 5 months ago

DigiCert: Malformed ICA

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: martin.sullivan, Assigned: martin.sullivan)

References

Details

(Whiteboard: [ca-compliance])

Attachments

(2 files)

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

As part of our post-issuance review of a key ceremony, we discovered that an issuing CA was created that didn’t meet the requirements of the baseline requirements. Specifically, the profile lacked the CRL extension. After discovering the error, we ran a scan over all CA capable of issuing TLS certificates and found 9 additional ICAs that had errors resulting from improper profiles during key ceremonies. We are including all of these in this incident report. The nine additional issuing CAs included one that had an incorrect chain of signatures in an ECDSA CA (violating the Mozilla policy adopted Jan 2020), and 8 with the use of AnyPolicy for an external un-affiliated party (which is not allowed under BR section 7.1.6.3).

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

21-Feb-20 Ceremony to create ECC ICA (with incorrect ECDSA signatures)
20-May-20 Ceremony to create MS CA’s with AnyPolicy
18-Jul-20 Ceremony to create a new Issuing CA. (missing CRL)
18-Jul-20 ICA failed post creation Linting (missing CRL)
18-Jul-20 ICA uploaded to CCADB (missing CRL)
21-Jul-20 ICA revoked in next ceremony (missing CRL)
22-Jul-20 Review of previous Ceremonies
23-July-20 additional ICA found in scope (ECC ICA and MS ICAs)
29-July-20 Additional ICA scheduled for revocation within the 7 day requirement

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

DigiCert has stopped all new ceremonies for publicly trusted certificates until the key ceremony process contains more automation. We have some automation already, but the human component could flag issues as false positives and continue a key ceremony. We have attached a diagram showing the key ceremony process and where we have automation already and what we are currently building. The green boxes are all tools that already exist. However, we don’t have the automated flow through those tools to terminate the key ceremony process if there is an error. For example, in this instance of the ICA missing the CRL, the linter did catch the error, but the key ceremony team failed to abort when encountering the error and continued the key ceremony. The anyPolicy was not flagged by a linter (as it is allowed for hosted ICAs but not non-affiliated ICAs). The ceremony with the bad ECC ICA was held before zlint finished updating and was not flagged as bad by the ceremony tool.

We plan on stopping issuance until the part of the process (in green) is automated, making it impossible to proceed and complete a key ceremony on a linter error. After finishing requiring mandatory flow through the automated tools and preventing overrides, we will resume key ceremonies while we work on the blue items. The purple items (CCADB related) are subject to salesforce integration. We’d like to automate those and will do so immediately after receiving approval from Mozilla.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

Ten ICA certificates were issued Between 21-Feb-20 and 18-July-20

ECDSA Cert (issue 1)
https://crt.sh/?id=2517734974

Microsoft CA with Any Policy (issue 2)
https://crt.sh/?id=2841732843
https://crt.sh/?id=2841732828
https://crt.sh/?id=2841732842
https://crt.sh/?id=2841732827
https://crt.sh/?id=2841732943
https://crt.sh/?id=2841732835
https://crt.sh/?id=2841732847
https://crt.sh/?id=2841732837

Missing CRL (issue 3)
https://crt.sh/?id=3112858733

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

See #4 above.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

For the ECDSA CA, this was created prior to the linting check. During the ceremony, we did not detect that the signature was not one permitted under the Mozilla policy. It was only through the recent scan using an updated zlint that we detected the issue.

We do have developers who contribute to zlint already. To expand the scope of this, we are adding additional linters to the key ceremony for DigiCert specific items, like the anyPolicy issue. We are also regularly running zlint over the corpus of intermediates to detect issues as zlint is updated. We are syncing the whole process so any error hard-block the cert creation, even if a false positive.

The Microsoft CAs were created on legacy naming documents, similar to this bug https://bugzilla.mozilla.org/show_bug.cgi?id=1647084. The naming docs for internal CAs use anyPolicy for TLS. We are building into the linter a requirement that each TLS cert include all four TLS issuance OIDs. This will prevent the need for anyPolicy going forward.
In this case of the missing CRL, the naming documents actually included a CDP field. However, the naming documents failed to transfer to the key ceremony tool properly. The linter flagged the cert when a private test cert was created and caught the issuer when ran on the completed certificate. However, the team failed to recognize the error message and proceeded with key creation regardless. We scheduled revocation for the next ceremony.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

First, we have stopped all key ceremonies while we finish our key ceremony automation process. The updated process will automate most of the process, excluding parts like the key ceremony itself where network separation is required. We expect phase 1 to be completed next week. Phase 2 will be scoped and started release of phase 1. The new process is nearly identical to the current process expect the system moves the request through the process automatically and hard-stops the key ceremony on any linting error.

Assignee: bwilson → martin.sullivan
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

I thought it might be interesting to provide some additional information on the checks we have during the ICA process and what we are adding. Although some of these CA checks are duplicative of zlint, they are all in addition to zlint. I'll be refining these with additional checks over times as I comb through the BRs, Mozilla policy, and RFC 5280 for additional checks. Any suggestions are appreciated.

  1. All CAs are required to have a local configuration that includes at a minimum the AIA file info: {"extensions":{"aia":{"issuerFile":"issuerfile.crt"}}}

  2. Key Usage Checks
    a. CA certificates must have digitalSignature, keyCertSign, and cRLSign set and no others
    b. OCSP responder certificates must have digitalSignature set and no others
    c. All certificates must have key usage marked as a critical extension

  3. EKU Checks. The EKU extension must not be marked as critical. ICAs must have one of the following pairs: a) serverauth and clientauth OR b) codesiging and timestamping OR c) clientAuth and email Protection. OCSP responder certs may contain ONLY ocspSigning

  4. OCSP:
    a. OCSP URI is required.
    b. OCSP URI must start with http://.
    c. OCSP URI must be at least 10 characters long
    d. OCSP URI must not contain leading or trailing whitespace
    e. If OCSP responder certificate OCSP path must not be present and CRL path must not be present

  5. CDP Checks
    a. At least one CRL path is required
    b. All paths must start with http://
    c. All paths must have a length of at least 10
    d. All paths must not contain leading or trailing whitespace
    e. All paths must end with .crl

  6. AIA Issuer Checks (new)
    a. AIA issuer path is required
    b. AIA issuer path must start with http://
    c. AIA issuer path must be at least 10 characters long
    d. AIA issuer path must not have leading or trailing spaces
    e. AIA issuer path must end with .crt or .cer
    f. If OCSP responder cert AIA issuer path must not be specified

  7. Key Size Checks
    a. RSA: Key size must be at least 2048, Key size must be a multiple of 8, Public exponent is hard coded to 65537
    b. ECC Must be curves P256 or P384
    c. If OCSP responder cert key type must match issuer

  8. Signature Algorithm Checks
    a. If issuer RSA key support only SHA256, SHA384, and SHA512
    b. If issuer ECC key then P256 curve must use SHA256 and P384 curve must use SHA384

  9. Validity Checks
    a. Validity must be between 1 and 15 years
    b. Valid to and valid from must be correctly formatted dates
    c. Valid to must be greater than valid from

  10. Other Checks
    a. If name constraints are present they must be marked as critical
    b. SKI cannot be excluded
    c. AKI cannot be excluded

Note: rsa_mod_factors_smaller_than_752 and rsa_mod_not_odd are ignored because the fake RSA public key used in some preview certificates will produce these errors. Improving the fake RSA public key may be addressed in the future.

We are currently adding the additional checks (scheduled for deployment July 29):

  1. Subject Field may only contain the CN, O, and C fields
  2. All C fields must be a 2 character country code.
  3. Certificate policies:
  4. Cert Policy Checks (new)
    a. If serverAuth or codeSigning EKU is present
    b. If serverAuth EKU is present and the certificate is hosted then the exact following policy oids are required
    2.23.140.1.2.1
    2.23.140.1.2.2
    2.23.140.1.2.3
    2.23.140.1.1
    c. If serverAuth EKU is present and the certificate is either Apple or MS the the exact following policy oids are required
    2.23.140.1.2.2
    2.23.140.1.2.3
    2.23.140.1.1
    d. If codeSigning EKU is present the exact following policy oids are required
    2.23.140.1.3
    2.23.140.1.4.1

No other oids including those with a URL are allowed

b) codesiging and timestamping OR c) clientAuth and email Protection

This sounds like, similar to not reading Mozilla policy requirements, you may have missed Microsoft policy requirements?

https://docs.microsoft.com/en-us/security/trusted-root/program-requirements

Issuing CA certificates that chain to a participating Root CA must be constrained to a single EKU (e.g., separate Server Authentication, S/MIME, Code Signing, and Time Stamping uses. This means that a single Issuing CA must not combine server authentication with S/MIME, code signing or time stamping EKU. A separate intermediate must be used for each use case.

This has also been incorporated into the BRs, with SC31.

e. If OCSP responder certificate OCSP path must not be present and CRL path must not be present

You may want to make sure you've got id-kp-pkix-nocheck then, as otherwise OCSP responses will fail to validate on Windows platforms.

e. AIA issuer path must end with .crt or .cer

And that you've verified it has the correct MIME type and encoding? :)

Subject Field may only contain the CN, O, and C fields

Encoded as valid UTF-8 within UTF8String?

I think, going back to the incident response in Comment #0, I don't really see a good systemic explanation for how the issue happened. If I'm understanding correctly, it's "the wrong form was filled out", but that feels like only a surface examination of the issue and how it came to be.

Flags: needinfo?(jeremy.rowley)

Thanks - I caught the time stamping/code signing EKU mistake in the documentation after I posted. It is fixed in the code.

The EKU check restricts the issuing CA from having both code singing and time stamping. I've added a check (should be in development shortly) for encoding with UTF8 and with id-kp-pkix-nocheck as well. The ceremony tool hard-fails now on zlint as well so having it at the CA and zlint will give us double coverage in case our internal check fails or someone removes the check from zlint accidentally. I've asked the team to make sure it has the right MIME type and encoding as well.

For the UTF-8 within the UTF8string, do you have an example of one someone did wrong? I think we are doing it correctly but I'd like to compare with what a wrong one looks like.

For systemic issues, I think we felt that on multiple bugs at once. This incident, the CCADB listing (https://bugzilla.mozilla.org/show_bug.cgi?id=1647084) and the EV audit incident (https://bugzilla.mozilla.org/show_bug.cgi?id=1650910) are closely related in root cause and solution.

Root causes:

  1. Problem: Staff uses insufficient caution when performing critical operations.
    Explanation: We've done a full assessment on the manual processes with the key ceremony and audit process. We have these manual steps categorized and are systematically moving through the list to remove them from our operations. Although we can't remove all of the manual steps since key ceremonies are offline, we can harden the tools around the key ceremony to prevent staff from generating certificates with non-compliant fields. We ran a full scan using zlint on all ICAs we've created. I am currently reviewing the results, but I don't see any that jump out as problematic.
    Prevention: Eliminate manual operations and restrict staff that can operate key material to only those that have a history of careful execution on operations. The new key ceremony tool is live for TLS certs. We are working on one for sMIME next and completing the rest of phase 2.

  2. Problem: Insufficient controls around key ceremonies
    Explanation: Similar to the first root cause, we relied on the experience of staff rather than technical controls to enforce changes. This can lead to issues where processes or requirements change but habit limits the adaptability. In this case, we ran zlint and used a key ceremony tool to generate the sub CAs. Zlint did generate an error on the Microsoft ones but the zlint errors were ignored as false positives by the key ceremony team.
    Prevention: Eliminate any ability to ignore controls and ensure strict compliance. Better to not issue a cert than allow a wrong cert to potentially issue. The new ceremony tool aborts when hitting an issue instead allowing the staff to continue.

  3. Problem: Insufficient technical audits around 5280 requirements
    Explanation: We've spent a lot of time monitoring and working on improving the validation processes. We also have controls around the baseline requirements and Mozilla policy for end entity certificates. There is less auditing around sub CAs and the implication of the RFCs on them. We need to extend the internal auditing scope to include technical controls for sub CAs under 5280 and similar documents.
    Prevention: We are creating more technical controls around 5280.

Overall, the underlying root cause was focus on end-entity certificates and shoring up those processes without applying the same controls to Sub CA creation.

Hi Jeremy. From bug #1650910 comment #17 and comment #18, I think this bug is still awaiting disclosure of, and an explanation for, the misissuance of https://crt.sh/?id=3112858731 (which has a "Policy OID" of 1.3.6.1.5.5.7.2.1, which is actually the OID defined by RFC5280 for the CPS Pointer policy qualifier).

Thanks Rob - I've asked Martin to post the updated incident report right away.

Thanks for that Jeremy.

In regards to the 2 ICAs:
https://crt.sh/?id=3112858731
https://crt.sh/?id=3112858734

This Icas was were created and run through our lint’s as per usual with no issue found. Since this is not checked by zlint, the issue was not flagged by our system.

These ICAs were created to replace ones needing replacement due to the OCSP EKU bug. They were created on July 16, 2020 and revoked a few days later in the next ceremony on July 22, 2020 after a manual review found the profile was not what we wanted. Due to the Lint not triggering we were not alerted to create a case at the time.

When we did a system sweep of problematic CAs, these two were not found due to their revoked status.
We looked back at this after opening this bug and found that it was actually in breach of RFC 5280 sec 4.2.1.4.
Specifically “A certificate policy OID MUST NOT appear more than once in certificate policies extension.”

Thus we are adding them to this bug for recording. And this use case will be added to the checks in the Certificate Creation software so we have confidence this will not happen again.

Hi Martin.

The Certificate Policies extension uses OIDs in two separate ways: CertPolicyId and PolicyQualifierId. I believe that RFC5280 section 4.2.1.4's statement that "A certificate policy OID MUST NOT appear more than once in a certificate policies extension" is intended to refer only to the CertPolicyId OID(s).

In these 2 ICA certificates, the OID 1.3.6.1.5.5.7.2.1 appears once as a CertPolicyID, and once as a PolicyQualifierId OID, so I disagree that those 2 certificates are in breach of that particular clause in RFC5280 section 4.2.1.4.

RFC5280 section 4.2.1.4 defines the OID 1.3.6.1.5.5.7.2.1 (aka id-qt-cps) as a standard PolicyQualifierId value. I think this implies that that OID MUST NOT be used as a CertPolicyId at all.

Ryan: Is this what you were getting at in https://bugzilla.mozilla.org/show_bug.cgi?id=1650910#c17 ?

Flags: needinfo?(ryan.sleevi)

Correct.

It would be akin to using an rsaEncryption OID as a CertPolicyID

Flags: needinfo?(ryan.sleevi)

Latest update,
The TLS side of our automated Key Ceremony is complete and has now being used for Ceremonies with results as expected.
The SMIME/Client tests are being QA'ed now and we expect them live in the next 2-4 weeks

SMIME/client cert key ceremonies are live. We are working on improving the automation with CCADB still and streamlining the system.

Duplicate of this bug: 1664325

No updates yet. We are evaluating the best way to complete the remaining work on automating the process.

We are working on implementation of phase 2 (automation of the naming documents). We have about two sprints remaining to finish it up, meaning that we expect this to be live Nov 1. We should have phase 3 live at about the same time (automation of CCADB upload).

Thanks Jeremy. I realize Martin touched a bit on the naming document in Comment #0, but I'm hoping you can expand with a bit more detail regarding phase two. The comment simply said:

Phase 2 will be scoped and started release of phase 1.

So better understanding the scope and what is being worked on is useful, to understand the lifecycle from birth-through-disclosure, which is presumably what's happening.

Sure thing! I attached a draft of what we are building for key ceremonies. The flow is:

  1. An internal stakeholder needs a key ceremony for a customer (private, public, TLS, smime, whatever - it all follows the same flow). They use the attached webform to create a request. The fields on the request form are locked down to specific allowed values. For example, it won't let you selected a profile if it would mix TLS and smime.
  2. The request kicks off a multi-faceted approval process, requiring approval from operations, validation (if public), and compliance.
  3. (Already in place) The tool generates a dummy cert that contains all of the contents. The contents are run through zlint and our internal scanners to see if there are compliance issues. If there is an error, the key ceremony can't proceed since the code won't generate a key ceremony script. If there isn't an error, a key ceremony script/tool is generated. The tool signs the exact dummy cert that was generated in step 3 (which takes out the human element).
  4. (Already in place) the operations and compliance team hold an off-line key ceremony using the generated tool to create the new ICA. This tool is limited to what was approved during on the naming document. The key ceremony tool is also locked down so it can't generate any certs that use bad algorithms or improper fields.
  5. After the key ceremony, the team is required upload the completed cert back into the online key ceremony flow. A script then checks the uploaded cert against the naming document generated in step 1 to confirm the cert matches exactly what was requested. If a file is not uploaded within [24] hours of a scheduled key ceremony, alarm emails go out asking where the file is.
  6. We are currently working with a contractor who is integrating us with CCADB. After the match is complete, the tool calls the CCADB API to add the newly signed CA to the CCADB listing with the relevant audit and CPS info.
  7. After CCADB confirms upload, the tool adds the issuing CA to the list of available ICAs for the relevant software.

This is very much in flux while we build it, and we have several components going at the same time. Things may be slightly different when we get closer to completion. We are still tracking the first part of November to have the project complete.

Where would you like more details on this process?

the tool calls the CCADB API to add the newly signed CA to the CCADB listing with the relevant audit and CPS info.

Jeremy, AFAICT ccadb.org does not mention any CCADB API for submitting intermediate CA certificates. Instead, CAs are asked to either "enter them by hand" (https://www.ccadb.org/cas/intermediates) or ask a root store operator to perform a "mass import" (https://www.ccadb.org/cas/massimport) of 20 or more intermediate CA certificates from a CSV file.

Where would you like more details on this process?

Yes please! Where can I find documentation for this CCADB API?

There isn't an API quite yet - it's being worked on. Let me see where the people working on it are at with the documentation and share a link.

We are progressing on this project and have completed most of the engineering tasks, along with 99% of the UX for the review form. The latter part of this week is dedicated to testing and tweaks; with completion at end of the sprint (assuming no surprises in the next week). We should be live with the full solution mid-November.

All features are complete and the system is in final testing. We expect to be fully live in production on Friday of this week. Any final questions before we finish this off? If not, I'll plan a key ceremony for the next week or two to demonstrate the system working then request closure.

Flags: needinfo?(jeremy.rowley)

Update: All code is complete on the ceremony tool. We are waiting for the next public CA signing to demonstrate completion, which will happen after next week's US holiday. We will update again when we complete the key ceremony with the results.

We completed the Public Key Ceremony with the new tool today.

This went as expected and confirmed using external Checks/Lints after the Ceremony.

is there anything else needed or can we close this off?

Are there any additional questions from the community? If not, I will close this matter next week (Dec. 14-18).

Flags: needinfo?(bwilson)

(In reply to Jeremy Rowley from comment #18)

There isn't an API quite yet - it's being worked on. Let me see where the people working on it are at with the documentation and share a link.

Jeremy, are you able to share any further details about this CCADB API?

Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.