Closed Bug 1650910 Opened 9 months ago Closed 6 months ago

DigiCert: Inconsistent EV audits

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: brenda.bernal, Assigned: brenda.bernal)

Details

(Whiteboard: [ca-compliance] - Next Update - 21-Sept-2020)

Attachments

(3 files)

2.70 MB, application/vnd.ms-excel
Details
507.86 KB, application/vnd.ms-excel
Details
235.55 KB, application/vnd.ms-excel
Details
  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

As part of the Mozilla bug https://bugzilla.mozilla.org/show_bug.cgi?id=1647084, Rob Stradling posted a link to crt.sh listing some CAs with inconsistent audits https://crt.sh/mozilla-disclosures#disclosedwithinconsistentaudit. After reviewing, we confirmed there was an issue and we investigated further.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

01-Aug-2013: We issued the first issuing certificate.
09-Feb-2018: Last issuing certificate in scope signed.
02-July-2020, 15:17: Rob Stradling posts https://crt.sh/mozilla-disclosures#disclosedwithinconsistentaudit to Bugzilla.
02-July-2020, 22:00: Internal investigation started.
05-July-2020, 23:00: Investigation and remediation plan was discussed internally; we began building the revocation scope list with a timeline.
06-July-2020: EV issuance blocked for ICAs in scope.
06-July-2020: This bug was posted.

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

DigiCert is in the process of halting all EV issuance across CAs omitted from an EV audit report. Issuing CAs are properly listed in the WebTrust CA and BR audit reports and are still issuing non-EV certificates. We expect to replace the ICAs for EV issuance today (July 6, and we will revoke active EV certificates issued under these ICAs. We’re still determining the number of EV certs impacted, but we think it’s approximately 50k.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

This issue impacts a number of ICAs created between 01 Aug 2013 and 09 Feb 2018. Going forward, all ICAs capable of issuing TLS will be listed in both the EV audit and WebTrust audit. Since all TLS issuing CAs are also capable of issuing EV, all ICAs should be included in the EV report.
Here’s the list ICAs that have issued EV certs without being listed in the EV audit report:
DigiCert Global CA G2 https://crt.sh/?id=8656330
GeoTrust TLS RSA CA G1 https://crt.sh/?id=250864679
Thawte TLS RSA CA G1 https://crt.sh/?id=250864680
Secure Site CA https://crt.sh/?id=329144662
NCC Group Secure Server CA G2 https://crt.sh/?id=5836038
TERENA SSL High Assurance CA 3 https://crt.sh/?id=5797998
For clarity, all ICAs listed above were included in CA and BR audits but not the EV audit.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

We are working on a list of impacted end-entity certs and will publish them later this week. Note that all EV end-entity certs were covered by the EV audit in scope of the auditor’s sample. The part not in scope was us having the auditor list all of the TLS ICAs. We are revoking the end-entity certs impacted. We think this is a first for the industry but are basing the revocation and plan on previous scenarios where certs are adequately covered by an audit.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

In the past, ICAs were listed in audit reports based on planned usage rather than whether they were capable of issuing EV, meaning that not all TLS issuing certs were listed in the audit report. This is separate from how we pull EV data for the auditor sample, where the sample is pulled from all issued certs, regardless of chain. The result is a weird situation where all of the certs were tested against the EV requirements, but the audit report did not list the specific ICA. Because of this, we are revoking all of the end-entity EV certs and moving them to a new chain. We’re also moving up the audit to start in August with an earlier period end than the prior year and listing all ICAs in the next audit report.

All EV certificates issued from these ICA have been included in the sample population extract for our EV audits in previous year’s audit. However, the list of EV ICAs in the report has been based on the previous year’s scope list with any deltas (adds for new signings and removals for revoked or expired within the audit period). The process for including new ICAs in the audit scope lists is changing for our 2019-2020 audit period to cover for all possible usage and TLS cert types.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

DigiCert is disabling EV issuance for these ICA, and is in the process of revoking all valid EV certificates chained to one of the impacted ICAs. Starting with the 2019-2020 audit period, all TLS-capable issuing CAs will be covered in TLS based audits: BR and EV. The scope list undergoes three levels of review before being finalized, and it’s checked with multiple sources (CCADB and our internal Root/CA database that captures all signings). We expect to be underway with our upcoming period audit shortly.

We think this is a first for the industry but are basing the revocation and plan on previous scenarios where certs are adequately covered by an audit.

I think this is as well, and I really want to commend DigiCert for doing this and filing this incident. This is one of the rare times that I've received an incident report and said "Wow, that's a good incident report". As DigiCert used to struggle in this area, I want to say, that's a real marked turn-around and a really positive sign.

The only thing that I think is missing here is understanding the timeline for this transition. It's not clear to me what the actionable update dates are, beyond the updated audit period that sounds like it's shifting in August.

Assignee: kwilson → brenda.bernal
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(brenda.bernal)
Whiteboard: [ca-compliance]

Thanks Ryan. A couple of thoughts on timing:

  1. All ICAs not covered by the EV audit are being turned off for issuance today.
  2. The new audit is scheduled for Aug (the soonest we could get auditors in).
  3. Revocation will commence and hopefully complete on Friday of this week (we're trying to follow five day rule on this one). If, for whatever reason, we don't revoke 100% by Friday, we will file a separate incident report for delayed revocation and give you both the reason for the delay and serial numbers not revoked at that time.

Sounds good. I'm setting Next-Update to cycle back at the end of August, although if anything changes in the timeline, please report that sooner.

Your described plan for revocation sounds good, and creating an incident report for anything that misses the window is a suitable update.

Flags: needinfo?(brenda.bernal)
Whiteboard: [ca-compliance] → [ca-compliance] - Next Update - 31 Aug-2020

Ryan,

Revoking over 50,000 certificates within 5 days is a draconian move that is only warranted when a severe security breach has been detected.
There needs to be some common sense in determining how long to allow before the certificate is revoked. Minor typos in province or mistakes with audit reports should be given 2-4 weeks to revoke certificates.

Five days is set by the Mozilla policy/CAB forum requirements, not by Ryan. It's something the industry would need to look at on how to resolve, especially when there are extenuating circumstances like a pandemic. However, it's unfortunately not something the CAB forum can change before July 11th.

The goal of the Web PKI is to ensure an agile ecosystem. The best way to ensure that agility is to ensure that revoking 50,000 certificates is easy, so that it does not matter whether or not there is the security breach is determined severe, because there's no disruption either way. Discussions about severity only serve to delay remediation, and more often than not, are not made in good faith. If revoking over 50,000 certificates is difficult, you should ask how you would handle what you deem a "severe security breach" which would require the same, and make sure your system is robust to handle both scenarios. In particular, the goal is that a "severe security breach" results in no interruption or disruption, which is how we make security breaches less severe: by paying down improvements time, rather than at the time of the breach.

Unfortunately, when using a shared PKI, like the global Web PKI, it's not sufficient to consider an individual organization's risk analysis. That's because even if something might, by your individual estimation, be an appropriate trade-off, it very frequently isn't, in the aggregate. This is similar to, say, disabling SSL 3.0 or TLS 1.0: even if a single organization might determine its appropriate for their use, due to other factors specific to their organization (e.g. it's layered within a robust VPN), in aggregate, it's simply insecure.

The failure to have appropriate audits is indistinguishable from a severe security breach, and consistently has been treated as a severe security breach. I appreciate DigiCert's recognition that, as a globally trusted CA, their brand is based on how well relying parties, which include browsers, are confident that DigiCert will put the global ecosystem security over individual customers' needs. This is because "trust" is "consistency over time". We need to trust that DigiCert will do the right thing, consistently, over time, and this bug is a recognition of the need to demonstrate that, even when it's inconvenient.

I won't bother you with my opinion on whether the cure is worse than the disease, but I do think it is confusing to read here that certificate revocation will start on Friday. The system administrators that have been working around the clock to replace their EV certificates were told that they have a deadline on July 11, 2020 at 12 pm MDT (July 11, 18:00 UTC). See https://knowledge.digicert.com/alerts/DigiCert-ICA-Replacement.

Some of them are now starting to panic when they read this page stating that they will have less than 24 hours to finish what to me seems a mission impossible (thousands of servers are affected in our community alone).

Can someone here confirm that no certificate issued by any of these ICAs will need to be revoked by Digicert as a consequence of this audit?

Based on the details provided, revocation is expected and required within the timeframe defined, as an industry-wide requirement.

We have posted a delay in revocation related to this bug here: https://bugzilla.mozilla.org/show_bug.cgi?id=1651828.

I would also like to correct https://bugzilla.mozilla.org/show_bug.cgi?id=1650910#c2 above, that our 5 day timeline falls on Saturday, July 11, 2020 at 12 pm MDT.

Mozilla's position is not just stated in the Root Store Policy, but also in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation - "the question about when this should be done, particularly if it's not possible to contact the customer immediately, or if they are unable to replace their certificate quickly..." And "Mozilla recognizes that in some exceptional circumstances, revoking the affected certificates within the prescribed deadline may cause significant harm, such as when the certificate is used in critical infrastructure and cannot be safely replaced prior to the revocation deadline, or when the volume of revocations in a short period of time would result in a large cumulative impact to the web. " " The decision and rationale for delaying revocation will be disclosed to Mozilla in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include an explanation for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis. Any decision to not comply with the timeline specified in the Baseline Requirements must also be accompanied by a clear timeline describing if and when the problematic certificates will be revoked or expire naturally, and supported by the rationale to delay revocation."

So, I think we recognize that this is a situation where all certificates cannot be revoked within the BR timeframe. We would like Digicert to present a well-explained plan in its delayed revocation bug--we much better prefer a well-thought-out plan over something that is rushed through.

The revoked ones are attached.

Attached file COVID-5253_hashes.csv

Attached are the ones where I thought the certificate replacement issues were impacted by COVID plus one entity determined as critical infrastructure plus one entity where we are not permitted to revoke until July 20 because of a court order. I'll provide the reasons as well once I've finished organizing them.

I think this was filed under the wrong component. It should be moved to the CA Certificate Compliance component on Bugzilla.

Flags: needinfo?(bwilson)

I believe this bug is the right component (in at least, it's where we've tracked a number of audit issues), but I've gone ahead and moved it just in case. However, consistent with DigiCert's repeated failures to do things as expected, they posted the update here, as opposed to Bug 1651828, which is where this information is expected.

DigiCert folks: Please make sure to keep better track of your bugs than your audits. This bug should work to understand how and why your CAs were not part of the appropriate audits, and how that's being remediated.

Bug 1651828 should be used to track your continued failures to revoke on time, and should be used to ensure that you provide the details as required by https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation , which are still quite questionable as to whether that's been done. Information such as what wasn't revoked really belongs there, as does the per-Subscriber explanations, remediation, and mitigations. Given DigiCert has, unfortunately, ample experience with this, I encourage you to review your past incident bugs if there's confusion about what's expected.

Component: CA Certificate Root Program → CA Certificate Compliance
Flags: needinfo?(bwilson) → needinfo?(brenda.bernal)
QA Contact: kwilson → bwilson

Here's my initial analysis:
DigiCert still has several manual processes it follows when creating issuing CAs, uploading the issuing CAs to CCADB, and providing audit information. Unfortunately, the policy on EV audit scope was not correct meaning that someone following the document would incorrectly limit the scope of the audit report to exclude TLS issuing CAs from the report. The policy documents are not static and several people reviewed the policy document over the years. However, typically the staff following the policy document were also the ones reviewing it, and they missed the underlying issue.

We updated the policy, but that’s obviously not enough to address what went wrong. The bigger root cause, and the one we need to fix immediately, is the manual processes required in the compliance and PKI Ops roles. With scale, manual processes become increasingly risky for something to go wrong. I know we posted previously about the risks of a manual process (https://groups.google.com/forum/#!searchin/mozilla.dev.security.policy/linter%7Csort:date/mozilla.dev.security.policy/3iuG8KGryC4/_RTq_SxJBwAJ) but that thought and subsequent implementations was not broad enough. We did implement a linter to help reduce manual processes in key ceremony review and better tooling to make key ceremonies operate more efficiently. We also implemented a lot of process changes and procedural controls to try and flag manual processes going wrong better in several bugs (eg https://bugzilla.mozilla.org/show_bug.cgi?id=1647084, https://bugzilla.mozilla.org/show_bug.cgi?id=1451950). These work to highlight danger, but there is still a manual component, meaning an incorrect policy or carelessness could still cause an issue. I am a convinced, after reviewing past incidents, that any manual process is a risk in our operations. Currently, we are looking through each process to see what is manual and what is not, particularly in our PKI Ops and Compliance operations. Shifting to the format you suggested:

  1. Problem: DigiCert PKI Ops and compliance processes contain manual steps. Despite having built in linters and similar safeguards, the number of process pieces that require human involvement creates unnecessary risk and needs to be reduced/eliminated. Procedural safeguards are insufficient to prevent reoccurrence.

  2. Detection: We are currently looking at which processes are manual and what needs to be done to automate them. For example, we are looking at how we generate new CAs, how we create audit documents, and how we upload information to CCADB. Each of these have a manual element to it that needs to be changed.

  3. Prevention: This is a preliminary report on an ongoing investigation into what manual processes we have and are looking to eliminate. Considering the manual nature of key ceremonies, I’m not sure I can get to 100% automation in the process. However, internal research has yielded a couple of tasks that would remediate this instance and prevent future incidents.

a) Generation of audit reports. All audit reports need to be pulled directly from our root repository. To really make this instance a “never-again” scenario, we need to generate all CA scope information for the audit reports directly from the CA database. Anything publicly trusted will be listed on the WebTrust report. Anything with a serverAuth EKU will be listed on both the BR and EV report. This is already how it should work of course, but generation of the reports directly from the CA database will prevent mistakes like this incident. .

b) Integration with CA Database and CCADB. I'd like to integrate key ceremony results directly with our CA database and CCADB, automated the delivery from key ceremony to the root databases. We've already started automation into the CA database from the key ceremony. I’m working with Kathleen and Ben on that to see if it’s possible.

We are continuing to look internally at additional processes to eliminate manual steps. I will post more as subsequent updates. We are also looking through all the other issuing CAs to see what else could have issued TLS certs without being on an EV audit report. We’re still scanning this list and will provide a full report of the results. None of the other ICAs missing from the EV audit report are marked as issuing EV.

Flags: needinfo?(jeremy.rowley)

For clarity, we do have automation in the compliance in the CA creation process. However, there are still some touchpoints that require human interaction. The risk and automation I'm referencing is end-to-end from ICA creation to end of ICA life. We had elimination of these touchpoints on the roadmap already but are reprioritizing them based on this incident.

(In reply to Jeremy Rowley from comment #16)

For clarity, we do have automation in the compliance in the CA creation process. However, there are still some touchpoints that require human interaction. The risk and automation I'm referencing is end-to-end from ICA creation to end of ICA life. We had elimination of these touchpoints on the roadmap already but are reprioritizing them based on this incident.

As we've seen from Bug 1654967, it's hard to believe DigiCert is capable of using their CA key material without causing some form of incident. I can understand "human interaction is hard", I'm concerned at the nearly three years of systemic failures by DigiCert, and that these incidents keep happening.

I'm raising it here because Bug 1654967 shows that statements like those in Comment #15, specifically:

We did implement a linter to help reduce manual processes in key ceremony review and better tooling to make key ceremonies operate more efficiently.

Don't seem to be supported by things like https://crt.sh/?asn1=3112858731 (notably, the policy OID 1.3.6.1.5.5.7.2.1), for which I'm also having trouble finding an incident report for. I'm also not really seeing a clear timeline from Comment #15 for improvements. Am I missing something?

I posted about the linter, its existing capabilities, and what we are adding in the malformed CA bug. I also asked Martin to discuss that particular cert on that bug as well since he was involved with the key ceremony for that certificate. He should be updating that incident shortly I think.

For timelines, we are rolling out the first part of the improved linter tools (specifically to catch policy OIDs and other situational checks) today. We will then continue working on the automation of key ceremonies. At the same time, we are working on the automated reporting. Our goal is to have the automated reporting in place before we start the audit in August. We're still speccing it out beyond simply generating a report of all roots, issuing CAs, and their use since we want to figure out if we can run ALV over the report generated by our internal CA database to ensure the list of CAs is properly formatted when exported. I'm not sure what this will take yet (or if its even possible). I have a meeting this Thursday with the team where we will hopefully clarify how to do this.

Flags: needinfo?(jeremy.rowley)
Flags: needinfo?(brenda.bernal)

The team did meet and put together a plan on automating the audit report. The automated report system is in development . The audit tool will create reports for the auditors as follows:

  1. for Baseline Requirement and Webtrust for EV:
    a. Chains to a root in a browser
    b. Where cA = true with the following EKUs: AnyPolicy/ServerAuth/No EKU/Any EKU
    c. Expired, revoked, or active since last audit

  2. For Code Signing:
    a. Chains to a root in a browser
    b. Where cA = true with the following EKUs: AnyPolicy/codeSigning/No EKU/Any EKU
    c. Expired, revoked, or active since last audit

  3. For Webtrust:
    a. Chains to a root in a browser
    b. Where cA = true
    c. Expired, revoked, or active since last audit

For each cert, the report lists :

  1. Certificate Subject Common Name
  2. Type (Root/Intermediate Certificate
  3. Certificate Issuer Common Name
  4. Valid From,Valid To
  5. Technically Constrained (Y/N)
  6. Revocation Status
  7. Date of Revocation
  8. Extended Key Usage
  9. Policy OIDs
  10. Certificate Serial Number
  11. Distinguished Name
  12. SHA-256 Fingerprint

The plan is to:

  1. Have development combine all the above reports into a single report.
  2. The single report will then have a download option added into Rootica.
  3. When requested the report will either run the report automatically taking into account the new time parameters or will have the option to define the time period the information is needed for.
  4. Report will be created following the formatting requirements listed here: https://www.ccadb.org/policy so that it will be consumable by ALV.

We will probably make several tweaks to this between now and release. We are thinking it'll be done in about 1-2 sprints, which means the next update will hopefully be when the tool is live and include a sample generated audit report. If not, I'll post an update on where we are it in the development cycle.

Flags: needinfo?(jeremy.rowley)

We are not quite done. QA and review of the report has taken a bit longer than expected. We're hoping to have it done in the next week. We do have the ICA information being exported to github right now: https://github.com/digicert/reports/.

From: https://github.com/mozilla/pkipolicy/issues/147 recent updated.

CA which can issue EV even if that have not, must have EV-audit. From red listed on here: https://crt.sh/mozilla-disclosures#disclosedwithinconsistentaudit
Example:
https://crt.sh/?caid=1397&opt=mozilladisclosure
https://crt.sh/?caid=5886&opt=mozilladisclosure
https://crt.sh/?caid=4101&opt=mozilladisclosure

all EV-able but not under EV-audit. These CA were not on this incident-report but must be? Will they be revoked also?

No. The requirement is that they must be audited if they are issuing EV certs, not if they are capable. See Section 3.1.2 of https://www.mozilla.org/en-US/about/governance/policies/security-group/certs/policy/.

The link you posted is a proposed policy but hasn't made it into the full policy. I wouldn't mind seeing that policy adopted.

Alright, looks like the automated audit report is working okay. I've uploaded a sample. Anything else we need to do to close this bug?

Flags: needinfo?(jeremy.rowley)

I believe this bug can be closed. I'll schedule it for closure on 21-Sept-2020.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] - Next Update - 31 Aug-2020 → [ca-compliance] - Next Update - 21-Sept-2020

Ben: I'm concerned that I don't believe Comment #22 really addresses the concerns in Comment #21. Despite Comment #22, we've seen clear statements in https://github.com/mozilla/pkipolicy/issues/147 regarding the expectations, which contradicts Jeremy's conclusions in Comment #22.

EV audits are already required for all EV-capable intermediates, regardless of whether or not those intermediates are actually issuing EV certs. That's been what we've been implementing in CCADB re: ALV, and it is what crt.sh has already implemented, and long implemented, in https://crt.sh/mozilla-disclosures#disclosureincomplete

While I'd be hesitant to suggest that CAs in general MUST check that page, the fact that DigiCert has repeatedly failed to properly disclose, over a pattern of years, is a real issue, and it's been repeatedly brought to DigiCert's attention for compliance. If DigiCert had concerns with the interpretation or understanding, which is absolutely what we've been operating on, there's been ample time for them to proactively raise any concerns and request any necessary clarification. Those issues are not false positives, and that's been consistent with the approaches in the past, not just for DigiCert but for other CAs with undisclosed/unaudited intermediates.

So I don't think DigiCert's argument of ambiguity holds, especially given the systemic patterns of issues here, and the many good-faith efforts to help DigiCert get on a better plan for compliance, going back to the days of your own involvement.

Thank, Ryan.

Can DigiCert now give a time for incident-report and revokation of other CAs?

Trying to invent a distinction between Clarifying things and Changing things is probably not helpful, especially since, despite the use of Capitalization, neither is a defined term, nor is there any policy guidance about how to tell which is which.

As Wayne said, "CAs ... can ... Is this what we want?" Which clearly acknowledges both the literal reading of the existing policy and the desire to change it. Changing the policy to clarify this issue would undoubtedly be helpful, as we said previously.

However, until the policy is changed, the only reasonable course of action is to enforce the policy as written, as that is what CAs are expected to comply with.

I am working on a more formal response to this issue, but it will take me until tomorrow sometime to post it.

I hesitate to argue or debate about how things came to be or why they are, but I feel I need to put my perspective forward in support of the position stated below.

First of all, I think we can all agree that CAs capable of issuing EV SSL/TLS certificates should be included in EV audit letters, and a clarification will be made to the Mozilla Root Store Policy as soon as possible. (See Issue #147 in Github). Also, in DigiCert’s Incident Report, it says, “Going forward, all ICAs capable of issuing TLS will be listed in both the EV audit and WebTrust audit. Since all TLS issuing CAs are also capable of issuing EV, all ICAs should be included in the EV report.” So I believe that this bug can be closed and that the clarification to be made through Issue 147 will resolve this once and for all.

Policy Provisions

Section 3.1.2.1 of the Mozilla Root Store Policy currently says, “If being audited to the WebTrust criteria, … For the SSL trust bit, a CA and all subordinate CAs technically capable of issuing server certificates must have all of the following audits: … WebTrust for CAs - EV SSL (if issuing EV certificates).”

Section 3.1.2.2 of the Mozilla Root Store Policy currently says, “If being audited to the ETSI criteria, … For the SSL trust bit, a CA and all subordinate CAs technically capable of issuing server certificates must have one of the following audits, with at least one of the noted policies or sets of policies: … An audit showing conformance with the EVCP policy is required if issuing EV certificates.”

Policy Interpretation

As Wayne Thayer noted in opening Issue #147, “This literally means that CAs with EV enabled roots can opt specific intermediates out of EV audit scope by declaring that they don't issue EV certs.” Sections 3.1.2.1 and 3.1.2.2 of the Mozilla Root Store Policy have been ambiguous because they say both “all subordinate CAs technically capable of issuing …” and “if issuing EV certificates”. The two phrases lead to inconsistent conclusions.

Decision / Position

Taking WebTrust audits as the relevant example, a reasonable interpretation of this language might be that for the ordinary SSL trust bit, a WebTrust for CAs audit and an SSL Baseline Requirements audit would have been sufficient (and an EV audit would have been required if the CA is issuing EV certificates). DigiCert included the relevant CAs in at least the WebTrust for CAs audit and the SSL Baseline Requirements audit. I am unaware of anything to date that is a clear communication from Mozilla to CAs and auditors of our interpretation that all subordinate CAs under an EV-enabled root require an EV audit. This will change going forward with the policy clarification, see next.

It has been raised that Issue #147 is not a policy “change” but a policy “clarification” and that DigiCert “should have known”. Nonetheless, in an attempt to objectively look at an interpretation of the Policy, the current wording does have the conditional “if issuing EV certificates”, which needs to be changed to “if capable of issuing EV certificates”, and the immediate question is whether there is a basis upon which it would be reasonable for Mozilla to require DigiCert to revoke the CAs that were not included in the EV audit. Up until now it was reasonable for DigiCert to believe it did not have to include certain CAs in the scope of its EV-related audits, so I don’t believe it is reasonable now for Mozilla to require them to revoke those CAs.

Ben,

I can totally appreciate the viewpoint of "The language was confusing", but I don't think that's an appropriate or wholesome response to DigiCert here, given the surrounding evidence.

That is, as captured in Issue 147, this started off as an attempt to clarify existing requirement and expectation. In general, these clarifications are done because they reflect long-standing practice and expectation, but for new CAs, without a history of following activity in the Forum or Mozilla, they might be confused. However, I don't think that generous interpretation applies to DigiCert.

In this case, DigiCert's interpretation of policy appears to stretch belief, and is best akin to when Trustwave issued a MITM certificate, and Mozilla had to remind CAs that no, validating domain names actually means validating domain names, and you can't issue MITM certificates. Yet even with that, it was and is still seen as potentially necessary to "clarify" that MITM isn't permitted.

Ultimately, audits are for Mozilla and the community. If an audit doesn't provide sufficient assurance, Mozilla can and should request clarification. There's nothing incepient in Mozilla policy that dictates all audits should be accepted, much like the tremendous work Kathleen places in verifying auditor qualifications. The question Mozilla should ask, which is the same question that anyone looking at this should ask, is "Do I have enough information to trust DigiCert" and "Should I be concerned DigiCert reached this conclusion, given the available information?". Similarly, Mozilla specifically should also take into consideration "What are the implications for incidents, going forward, for all CAs"?

Let's consider the facts:

I can understand wanting to take the policy on its own, and if DigiCert were a new CA applying for trust, I think there'd be no issue with saying "No, sorry, try again, and we'll clarify that", because it'd not be appropriate to take on the risk, even if it was a benign mistake. However, this is not a benign mistake, with DigiCert as a large (if not the largest) CA in Mozilla's program. Given the pattern of systemic issues, what might be a reasonable interpretation by a neophyte doesn't really hold up when examined through the lens of a mature, established CA that continues to violate expectations.

In the history of Mozilla, has there ever been an audit or communication where it's been communicated that intent, rather than capability, matter? If it does not matter for the BRs, if it does not matter for S/MIME, is it reasonable to conclude that somehow EV is exempt from this? And how does that assumption square with issues like the Apple issue where DigiCert recognized the need for audits of their sub-CAs to include the full scope?

So I think we can throw out "reasonable" as, well, being unreasonable. So the question is one of "What's appropriate". It seems that you might be viewing this lens as "Revoke or do nothing" as the only options, and while revocation would entirely be consistent with Mozilla's policy, I can understand that in this exceptional circumstance, which would and should never be repeated by another CA, Mozilla might see revocation as disruptive. I'd question what evidence we have of that, since it seems such a conclusion should be supported by hard data, if the goal is objectivity, but I also think it overlooks other options.

DigiCert has already been placed on remedial audit schedules in the past, in relation to the Symantec acquisition, and we saw that DigiCert exhibited a worrying trend of "moving quickly and breaking things"; that is, a number of troublesome issues in their development and integration that could and should have been avoided were, ultimately, revealed through the more regular audits. However, although there's considerable value, especially for DigiCert, in regular remedial audits, I don't think they're as applicable here.

Another alternative is, highlighting DigiCert's routine control failures, both directly and through their integrated acquisitions (e.g. most recently, QuoVadis, and the repeat violations that have occurred since DigiCert's acquisition), is to require DigiCert provide a more detailed report. Using the Webtrust provided detailed control report, better explanations about the system design and the human processes involved at DigiCert, and how these processes are examined, seems relevant precisely to this repeat failure. The fact that DigiCert has had so many issues with their ceremonies, which are heavily scripted and provide many opportunities for human review and fail-safes, suggests that there's a culture at DigiCert that may not be attentive to the requirements involved in being a CA.

Yet another alternative is to require Digicert obtain new and appropriate audit reports, were they appropriately audited. Of course, any challenges with that, either on performance or accepted, reveal yet another option: the removal of EV status from DigiCert, and the required establishment of a new, clean, appropriately audited EV hierarchy.

At the core, I think the conclusion here, about whether or not "it was reasonable for DigiCert to believe it did not have to include certain CAs in scope", is something not supported by the ample record, both with DigiCert and within the broader set of CA incidents. If Mozilla ignores the discussions on m.d.s.p., ignores the discussions with other CAs, ignores the discussions in the CA/B Forum, and only takes the policy, and any interpretations, in isolation, which is what appears to be proposed here, then it seems there is ample room for mischief and mayhem, and Mozilla will constantly be in a reactionary mode against bad-faith interpretations of its policy.

I can't help but feel that the conclusion, about whether or not this is right or expected, is being predisposed by an assumption that, if Mozilla says no, it is the effect of expecting or demanding revocation. I hope I showed that isn't the case, and that Mozilla has a wide array of options at its disposition to ensure that Mozilla, and the community that relies on Mozilla, has justified faith in DigiCert's operations.

If Mozilla ignores the discussions on m.d.s.p., ignores the discussions with other CAs, ignores the discussions in the CA/B Forum, and only takes the policy, and any interpretations, in isolation, which is what appears to be proposed here, then it seems there is ample room for mischief and mayhem, and Mozilla will constantly be in a reactionary mode against bad-faith interpretations of its policy.

That's better than the alternative in which CAs are held to unwritten rules. Creating clear and loophole-proof policies is [of course] difficult, but that's Mozilla's burden. Rather than assuming that CAs are making excuses because they 'should have know better', despite what the policy says, work with them to fix the policy and/or formally communicate the 'correct' interpretation.

(In reply to Wayne Thayer from comment #31)

If Mozilla ignores the discussions on m.d.s.p., ignores the discussions with other CAs, ignores the discussions in the CA/B Forum, and only takes the policy, and any interpretations, in isolation, which is what appears to be proposed here, then it seems there is ample room for mischief and mayhem, and Mozilla will constantly be in a reactionary mode against bad-faith interpretations of its policy.

That's better than the alternative in which CAs are held to unwritten rules. Creating clear and loophole-proof policies is [of course] difficult, but that's Mozilla's burden. Rather than assuming that CAs are making excuses because they 'should have know better', despite what the policy says, work with them to fix the policy and/or formally communicate the 'correct' interpretation.

I don't think this is at odds with what I wrote, but you seem to be positioning it as either/or.

The comparison to make here is to think about domain validation. The BRs describe a set of domain validation methods, and we see CAs interpret how to achieve those requirements with a wide variety of ways. When a CA implements a process that is "obviously" bound for failure, and failure results, we don't blame the policies and the Guidelines for not spelling out precisely how to implement normatively: we say the CA was given an objective and failed to achieve that objective, and to take a systemic and holistic evaluation as to how to achieve that goal.

When third-party audits were introduced to Mozilla policy, they were stated as merely a factor for evaluation, with the meta-goal being a clear and consistent evaluation, and in particular, the exercise of judgement. Let's also remember the context of this issue: Comment #0 specifically began because DigiCert had EV-capable intermediates, which were and had issued EV certificates, excluded from the scope of the audit. I think we should be careful about the post-hoc confusion mentioned in Comment #22, when this began with a very clear, unambiguous, systemic issue.

For example, Comment #21 notes that one of the CAs excluded from the audit scope was was one of the CAs mentioned in Comment #0 as those issuing EV certificates, so the proposal in Comment #22 doesn't apply. DigiCert's proposed remediation is "Revoke all the EV certs", which we see from Bug 1651828 troublesome, to say the least, and appears to be to pretend as if the intermediate is compliant going forward.

Again, I'm not suggesting that DigiCert somehow be "punished" for its misinterpretation, but I'm suggesting that in the sum total of available data, the interpretation was not reasonable, and in the sum total of problematic patterns, there's a worrying trend. The policy confusion is subordinate to that, and the question is really about whether the path forward, given the pattern, provides sufficient assurance. And, as mentioned in Comment #30, there are a number of options that provide greater assurance given the problematic trend.

(In reply to Ryan Sleevi from comment #32)

Again, I'm not suggesting that DigiCert somehow be "punished" for its misinterpretation, but I'm suggesting that in the sum total of available data, the interpretation was not reasonable, and in the sum total of problematic patterns, there's a worrying trend.

I think it's fair to conclude that there was plenty of evidence to support your argument that the literal ("issuing") interpretation of the policy is not sufficient, but the policy still says what it says and it's not the CA's sole responsibility distinguish opinions from requirements.

The policy confusion is subordinate to that, and the question is really about whether the path forward, given the pattern, provides sufficient assurance. And, as mentioned in Comment #30, there are a number of options that provide greater assurance given the problematic trend.

Without opining on the core problem identified in this bug or any pattern of behavior, I'm suggesting that - for CAs that were technically capable but were not included in EV audit scope and did not issue EV certs - the path forward is (1) clarify the requirement, and then (2) require EV audits going forward for CAs that are technically capable of EV issuance.

In response to Ryan’s comments, we want to provide the community greater assurance that we are staying focused on adequately addressing the audit gap noted in this bug. We will include all of the technically capable CAs of issuing EV certificates in our current WebTrust for EV audit (as well as WebTrust for CA and BR where they were previously included). This audit is already underway, and we have pulled up the audit year end date a month in advance (prior year end was October 31st, now reset to September 30th). DigiCert is obtaining new and appropriate audit reports that cover these CAs regardless of whether they’ve issued or not.

While we understand one aspect of this bug is focused on clarifying the policy, we support an amendment to the language here to drive consistency: in Section 3.1.2.1 of the Mozilla Root Store Policy WebTrust for CAs - EV SSL (if issuing EV certificates). Our goal to accelerate our audit is not to continue the debate about the policy discussion, but to show our good faith efforts to focus on compliance.

With regards to the remedial audit schedules in the past, if we are referring to the point-in-time audits and OEM quarterly-bi-annual audits that were part of the Symantec acquisition, we successfully completed those audits. The audits were inherited as part of DigiCert’s acquisition of Symantec and were implemented to ensure the Symantec infrastructure was operating in accordance with the browser expectations. Now that we have finished the audits, we can commence the final decommission of the TLS systems per our previous posts on the subject. We expect to finish decommissioning all front-end Symantec publicly trusted TLS systems by end of the year. We have already shut down half of these systems and are actively engaged in shutting down the remaining systems.

We want to ensure we keep the community’s trust in our operations, which is why we have been diligent in finding and reporting our own issues and the issues in the industry (e.g., bad state-province/country combinations). We send these reports to the other CAs to investigate and remediate. We have established an Analytics team that we’ve mentioned in our previous posts to diligently scan for our own problems and problems in the industry so we can be proactive in detecting and remediating any open issues. We have established a plan for the Analytics team to scan all certs for compliance with not just 5280 and the Baseline Requirements, but all criteria under the embedded RFCs referenced by 5280. We have asked our dev team to contribute to zlint and be part of the community that contributes to the betterment of the industry as a whole.

Over the last six months, we have focused heavily on automating manual processes to eliminate potential areas for human error and drive consistency across our operations. This automation includes:

  • Automating the revocation process with system’s initial launch planned for beginning of Oct.
  • Integrating our CA internal DB with CCADB via API integration to drive consistency in our audit scope reporting.
  • Automating the key ceremony process end-to-end with phase 1 of the project already complete involving additional CA checks before signing. Our focus is to complete the automation for the remaining parts of the process including the intake of the initial information for the CA signing request, elimination of input / spelling errors, and automation of the workflow from request through final approval from our Compliance team. Our goal is to complete these improvements in Q4 of this year.
  • Improving end-to-end handling and tools around compromised keys and revocations.
  • Consolidating our validation and issuance flows as part of our overall migration from legacy / acquired systems.

In summary, while we cannot re-state our past audits according to our auditors who represent us now, the current audit underway is going to cover all technically capable and issuing CAs as part of the scope for this audit year end. We expect to have those reports published by the end of this year. We will continue to be transparent and give relying parties the assurances they need about our operation. We’ve made meaningful progress since the Symantec and QuoVadis acquisitions, learning from our bugs and implementing technical improvements, and eliminating manual processes where we can to prevent future issues.

Based on DigiCert's sua sponte reporting of this incident and its most recent response, Comment #34, I again want to close this bug and intend to do so on or about 7-Oct-2020. DigiCert indicates that its audit year has now closed (Sept. 30th) and that their auditors are already including the EV-capable CAs in the scope of the EV audit. As noted in my email today to the mdsp list, Mozilla is going to resolve Issue 147 (https://github.com/mozilla/pkipolicy/issues/147) and clarify, once and for all, that CAs capable of issuing EV certificates must be included in EV audits. DigiCert indicates that they support this effort. Finally, DigiCert also listed various efforts being undertaken to ensure better compliance in this area with automation, which will improve recordkeeping and reporting of CAs.

Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.