Open Bug 1883843 Opened 2 months ago Updated 4 days ago

Entrust: EV TLS Certificate cPSuri missing

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: paul.vanbrouwershaven, Assigned: paul.vanbrouwershaven)

References

Details

(Whiteboard: [ca-compliance] [ev-misissuance])

Attachments

(2 files, 2 obsolete files)

Steps to reproduce:

Incident Report

Summary

Our team received a report from Ryan Dickson (in a personal capacity) about 10+ certificates that seem to be missing the required certificatePolicies:policyQualifiers:qualifier:cPSuri.

Impact

The misalignment between the TLS Baseline Requirements and TLS Extended Validation Guidelines led to the mis-issuance of all TLS EV certificates issued by Entrust since the changes for Ballot SC-62v2 have been implemented.

Timeline

All times are UTC.
2024-03-04:

  • 13:00 Report received from Ryan Dickson (in a personal capacity)
  • 14:41 Requested investigation from Incident Review Team
    2024-03-05:
  • 10:42 Mis-issuance confirmed, requested a report of all impacted TLS EV certificates
  • 11:30 Confirmed that both the Code Signing and S/MIME requirements to not require the cPSuri.
    2024-03-06:
  • 08:00 Publication of this incident report

Root Cause Analysis

The mis-issuance of EV TLS certificates occurred due to a discrepancy between the updated Certificate profiles in the TLS Baseline Requirements following Ballot SC-62v2 and the TLS Extended Validation Guidelines and the lack of cross-reference checks during the implementation.

  • We implemented this new “recommendation” as best practice without verifying if this would be compliant with other requirements/guidelines.
  • Lack of alignment between the different documents produced by the CA/Browser Forum.
  • Ballot SC-62v2 shifted policy qualifiers from MAY to NOT RECOMMENDED in the TLS Baseline Requirements, without considering the implications on Extended Validation Guidelines or other documents.

Lessons Learned

What went well

  • The mis-issuance reported by an external reporter was handled and escalated correctly.

What didn't go well

  • Certificate profile changes were reviewed in the wrong context.
  • While have used pkilint on other certificate types we should have tested with EV certificates as well.

Where we got lucky

  • The certificatePolicies:policyQualifiers:qualifier:cPSuri is not required by the Code Signing and S/MIME requirements.

Action Items

Action Item Kind Due Date
Deployment of pkilint as post-issuance linter in addition to existing linters Detect 2024-04-01
Propose a ballot to align the EVG with the TLS BRs Prevent 2024-04-01
Propose a BR of BRs that would help to avoid discrepancies Prevent Done

Appendix

Details of affected certificates

Assignee: nobody → paul.vanbrouwershaven
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [ev-misissuance]

Will you be including the complete certificate data of the mississued certificates? This report doesn't include the number of certificates mississued, any dates related to mississuance.

Did you stop issuance at any point? Did you restart issuance at any point?

What linters do you currently use?

Have Entrust noticed they are still actively mis-issuing certificates?
e.g: https://crt.sh/?id=12348828564

Is there a reason they haven't stopped? I would have expected that adding many more certificates to a presumably-large volume of certificates that will need revocation within 5 days is not a good thing.

We have not stopped issuance and we are not planning to stop issuance or to revoke certificates issued, we do think that this miss alignment between baseline requirements and the EV guidelines was an unintended oversight of SC-62v2 as explained in the root cause analysis. Revoking these certificates would have unnecessary big impact to our customer and the WebPKI ecosystem overall.

The reason to mark the policyQualifiers as NOT RECOMMENDED is that this would add additional bytes to the certificates of which the information is available within the issuing CA already and this was seen as a bad practice. We do not want to re-introduce a bad practice on the basis of a misalignment between documents.

During the investigation we identified 24819 EV TLS Certificate with the cPSuri missing, as we have not stopped issuing this number of certificates wil keep growing. We don't mind uploading a list of these certificates but do not think this will add any value in this particular case, including that they are all available through CT.

Entrust has implemented several linters, and performs pre and post issuance linting with one or more linters (including zlint and a custom linter). This particular issue is only being detected by pkilint, which was already scheduled to be added to our post issuance checks as also indicated in the action items.

A draft ballot SC-72 to fix the source of the problem has been created and we are actively looking for endorsers to move the ballot forward.

That's not really how this works though.

We recently had another CA that made pretty similar arguments: https://bugzilla.mozilla.org/show_bug.cgi?id=1872374. Specifically in comment 2

It is expected that CAs follow Bugzilla and apply lessons from them to their operations and future incidents. Unless I'm mistaken, this is rehashing the same arguments that were covered pretty in depth in various incidents before this.

At this point, there is an expectation that

  1. You stop mis-issuing these certificates knowingly, under the current BRs.
  2. File an incident for failing to revoke within 5 days.

If the ballot is adopted, it is not retroactive to existing certificates. Entrust should not be actively opting into mis-issuing certificates under the current rules.

In agreement with Amir - you cannot simply choose to opt-out of the requirements whenever you see fit.
You have admitted in the original note that the certificates are mis-issued. The reporter from the Google team reported them as mis-issued.
It's clear what action should be taken, which is to stop issuance and to revoke the mis-issued certificates.
It doesn't seem like other CAs are mis-issuing the same way, suggesting the problem isn't with the wording of the baseline requirements, rather Entrust's failure to follow them.

NOT revoking would be a bigger impact to the webPKI as yet again, another 'trusted' CA would be ignoring the rules and regulations whenever they choose. I hope Mozilla are taking this seriously and considering if a CA who wilfully ignores guidelines are one that should be trusted (especially with the comparably small certificate volume as Entrust).

Other CAs are still able to demonstrate not only that they follow guidelines, but they can revoke multi-thousand volumes of certificates within recommended periods of 5 days or even 24 hours. A CA that decides to ignore guidelines and not do this should have their position of trust called into question.

I agree with Amir and JR, though I believe that Symantec's response to their misissuance offers a far more apt comparison.

The Mozilla policy is also pretty clear regarding misissuance, stating, "In misissuance cases, a CA should almost always immediately cease issuance from the affected part of its PKI.

It also seems odd that Entrust is part of an organization that has claimed the 1-bit of entropy issue experienced by Let's Encrypt, Apple, and Google was a serious security issue and criticized Let's Encrypt for not revoking some of those affected certificates due to their short-lived nature and the inability to contact their owner.

The current response from Entrust appears to be:

  1. We admit to misissuing these certificates,
  2. We may have even been advocates of this rule we were not complying with,
  3. But despite this, we are not going to stop issuing and change our behavior to match the requirements despite clear requirements and precedence,
  4. We are also not going to revoke, despite clear requirements and precedence,
  5. And we are going to try to get the rules changed to accommodate this behavior.

At best, this seems wholly inconsistent with past handling of incidents and at worst, a case of what is good for the goose isn't good for the gander.

Establishing this type of response as acceptable would be harmful to the WebPKI would be actively harmful and make enforcing more serious issues near impossible in the future.

I am not happy that:

Even if that PR is merged today (which won't happen), these change are not retroactive.

I also expect that the Vice-Chairperson of the CA/B forum doesn't try to justify breaking the rules made by that same forum.

I just realized, I misspoke, the issue the organization in question spoke about was not the 1-bit of entropy issue that impacted Google and Apple but the 1-second extra validity that impacted Let's Encrypt. The point remains the same though.

We firmly believe that not revoking, and the continuation of issuance does not harm the security, reliability and compatibility of the ecosystem or the users in some other way. Doing the opposite will do more harm to the ecosystem as also stated in the note below, that was added to the TLS Baseline Requirements as part of the discussion around policyQualifiers in review of SC-62v2:

Note: policyQualifiers is NOT RECOMMENDED to be present in any Certificate issued under this Certificate Profile because this information increases the size of the Certificate without providing any value to a typical Relying Party, and the information may be obtained by other means when necessary.

We would like to be advised if anyone sees how the missing policyQualifiers in these miss-issued certificates will harm the ecosystem.

This bug impacts over 24000 EV certificates, involving many thousands of customers. These certificates are generally not managed through automation and used by large enterprises including many financial institutions and governments. Reissuing this number of EV certificates will require significant resources and coordination with our customers and their relying parties.

Entrust supports automation and the ongoing work to improve how the issuance and management of EV certificates can be better automated. Besides the work in the CA/Browser Forum we are the authors of two IETF drafts around ACME auto discovery that should help to increase the adoption of automation for all types of certificates including those obtained under commercial conditions.

We think that it’s better to spend our and the industry efforts on improving automation (like the auto-discovery work in IETF mentioned above), avoiding these kinds of issues such as through the BRs of BRs work (which we have been leading), improving the CAB/Forum documents and infrastructure, such as through better automation, etc.

If Mozilla and the community expect Certificate Authorities (CAs) to allocate their valuable time to matters driven solely by principle, we may be moving in the wrong direction and inadvertently hindering ecosystem progress. We strongly believe that demanding revocation of these certificates will do more harm to the trust in root programs, browser vendors, and the CA/Browser Forum than it will help CAs doing better in the future.

If that is your position, then you're still required to open a new incident for failure to revoke. These two are entirely separate matters, and need to be handled separately.

If Mozilla and the community expect Certificate Authorities (CAs) to allocate their valuable time to matters driven solely by principle, we may be moving in the wrong direction and inadvertently hindering ecosystem progress

The community expects that the self-regulation the CAs and Browsers are currently doing are actually followed by CAs and Browsers. If the CA/B is unable to regulate itself, then the community will have no choice but to involve state authorities in this matter, and that will hinder the ecosystem progress even further.

We strongly believe that demanding revocation of these certificates will do more harm to the trust in root programs, browser vendors, and the CA/Browser Forum than it will help CAs doing better in the future.

Again, the demand here is that you open an incident for failure to revoke. The semantics of revocation or lackthereof can be discussed in that incident.

This incident is particularly painful because there is a trivial way for Entrust to not be in violation of any of the requirements: begin including the cPSURI field in the certificates again, until such time as their proposed ballot passes and goes into effect. The BRs' "NOT RECOMMENDED" does not override the EVGs' "MUST". The fact that Entrust is willfully continuing to violate the EVGs simply because they think the EVGs should be updated is not in line with the expectations of this community and sets a terrible precedent.

I call upon Entrust to stop misissuing certificates immediately, given that it is easily within their power to do so without requiring issuance to be halted at all.

I also call upon Entrust to directly address in their "Failure to Revoke" incident why 24,000 certificates is too many to revoke when other CAs have successfully revoked 2.7 million certificates in 5 days, and what steps they're taking to be able to easily revoke that many certificates the next time an incident like this occurs.

It's important to remember that the "public" aspect of PKI means that the public is delegating its trust to the root CAs. You serve us--the public--over your subscribers. We don't want to end up in a situation whereby subscribers are incentivized to choose CAs that are more likely to push the boundaries of the BRs.

It's telling that the only entity advocating for Entrust's decision to continue mis-issuance thus far has been Entrust itself.

You need to open a separate incident for your failure to revoke: that's the process. It's what all CAs are expected to do, even if the failure to revoke was justified.

This bug impacts over 24000 EV certificates, involving many thousands of customers. These certificates are generally not managed through automation and used by large enterprises including many financial institutions and governments. Reissuing this number of EV certificates will require significant resources and coordination with our customers and their relying parties.

Are you saying that Entrust is currently not in a position to revoke and reissue its entire certificate population within 24 hours, as per the requirements?

If the reason was 24,000 * keyCompromise, would you still not be in a position to do that, because of the lack of automation, significant resource requirements, and coordination with customers and relying parties?


I am personally worried by the choice to keep misissuing certificates knowingly, especially when a fix would be as simple as Comment #12 where Aaron suggests something reasonable and seemingly easy. I'm also worried by Entrust not following the rules & requirements for CAs, such as the filing of new incidents, lack of knowledge regarding retroactive application of rules, etc. Finally, I see that perhaps Entrust is not currently in a position to uphold their requirements, for various reasons, such as capacity planning, etc.

One more quick comment:

Since I understand that you're still misissuing certificates at this time, I would like to remind you of the incident response requirements, which state:

in the case of incidents which directly impacted certificates, the Appendix must include a listing of the complete certificate details of all affected certificates.

You'll therefore need to keep updating this incident (as well as the delayed revocation incident you will file, and any other additional incidents), with the list of all certificates you're still issuing. The list has to be complete so we can better assess the impact to the ecosystem and be able to mitigate it more effectively.

Thanks!

From a Mozilla root program perspective, we rely on the CABF requirements, so we would like Entrust to (1) stop issuance and fix the EV profile, and (2) file a separate bug for delayed revocation that explains its plans regarding the EV certificates that have been mis-issued. This approach is aimed at maintaining consistency among responses to problems that arise because of CABF requirements (including profile errors that might not have any security impacts). Our guidance on this issue can be found at https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation, which states in part, "Mozilla recognizes that in some exceptional circumstances, revoking the affected certificates within the prescribed deadline may cause significant harm, such as ... when the volume of revocations in a short period of time would result in a large cumulative impact to the web" and "your CA is ultimately responsible for deciding if the harm caused by following the requirements ... outweighs the risks that are passed on to individuals who rely on the web PKI by choosing not to meet this requirement." The web page linked above also has a bulleted list of items that are expected to be addressed in the separate "delayed revocation" bug to be filed. Finally, if our guidance on these types of issues could be improved in any way, please let me know. Thanks, Ben

"We would like to be advised if anyone sees how the missing policyQualifiers in these miss-issued certificates will harm the ecosystem."
It is not about 'harm to the ecosystem' as Entrust's willingness to ignore the rules and regulations so boldly.
It is concerning that Entrust is disregarding rules and regulations so brazenly.

It is not acceptable to demand proof of harm caused by breaking the rules. This lack of accountability raises questions about what else Entrust may not be capable of.
Without addressing and resolving the issue of mis-issued certificates and revocations promptly, trust in Entrust should be, and is at risk.

Other Certificate Authorities have successfully completed mass revocations in a timely manner, as evidenced by examples such as:

https://bugzilla.mozilla.org/show_bug.cgi?id=1883731
and
https://bugzilla.mozilla.org/show_bug.cgi?id=1883620

It appears that Entrust may not have an adequate solution in place for quickly revoking and issuing new certificates for its customers. That is an Entrust problem.
This should not be allowed to become a problem for the web-PKI and internet users worldwide.

I anticipate Entrust will cease issuing certificates and rectify the situation over this weekend, providing a comprehensive list of mis-issued certificates.

If we do not, I would suggest the community call into question the trust of Entrust within Mozilla and other root programs who monitor this Bugzilla.

My own research of the Certificate Transparency system shows some of the customers who will have to revoke are:

JPMorgan Chase and Co.
Delta Air Lines, Inc.
Bank of America Corporation
Tesco PLC
Fidelity Investments (FMR LLC)
American Airlines Inc
Westpac Banking Corporation
ING Groep N.V.
Interactive Communications International Inc
Experian Limited
ING-DiBa AG
DBS Bank Ltd
Pricewaterhousecoopers LLP
Entrust Corporation
The Toronto-Dominion Bank
Experian Information Solutions, Inc.
Global Payments Inc.
M&T Bank Corporation
Citizens Financial Group Inc
IDFC First Bank Limited
The Travelers Companies Inc
BNP PARIBAS SA
Pole Emploi
BNP Paribas Fortis SA
Otsuka America Pharmaceutical, Inc.
Queen's University at Kingston
Huntington Bancshares Incorporated
Nedbank Limited
Clydesdale Bank PLC

I would like to echo JR Moir's point here. Also, Entrust is still actively and knowingly mis-issuing certificates. Entrust has now had 10 days to mitigate this problem.

I also question the trust given to Entrust at this point. This incident from a simple mistake that could've been simply mitigated to an active disregard of the rules set by the CA/B, and combative behavior when asked to follow said rules. As a stakeholder in the web and public PKI ecosystem, and as someone who used to work on public CA systems, I do not see a reasonable justification for this inaction.

If you are not able to fix the certificate profile in a timely manner, this raises two points:

  1. Why not?
  2. Why have you not turned off EV certificate issuance while waiting to be able to fix the certificate profile?

One of the primary reasons these incidents, even if they seem silly, are useful are to find ways we can improve the ecosystem even further. For example, without this incident: https://bugzilla.mozilla.org/show_bug.cgi?id=1715455, we wouldn't have ARI as an active IETF draft. A draft that is being implemented in various software and organizations currently. Even though that incident seemed silly, and no realistic danger to the ecosystem, it was still recognized that we do in fact need the ability for rapid response in the ecosystem.

What's very disappointing is that this incident could've led to similar very useful discussions, and conclusions. However, unfortunately, due to the responses it's turned into a forum where we're pleading Entrust to follow the rules. A situation that as a public stakeholder, we should never have to be in.

Hi Paul,

Thanks for filing this bug. Since submitting the problem report, I have returned from parental leave. From this point forward, all of my responses should be considered on behalf of the Chrome Root Program.

The Chrome Root Program Policy describes an expectation that CA Owners included in the Chrome Root Store adhere to the CA/Browser Forum Baseline Requirements. Beyond this, we expect CA Owners to be accountable for their actions and prioritize remediation when notified of something that hasn’t gone right.

Entrust's initial reporting of this incident and subsequent response fails to meet our expectations.

General Comments:

  • As with all bugs, we expect Entrust to respond to comments and questions presented by other community members in this bug (described on CCADB.org).

  • When submitting future incident reports, please be sure to closely follow the incident reporting guidelines available on CCADB.org. For example, the initial report fails to describe the number of affected certificates related to the incident or whether Entrust ceased issuance. It also does not describe whether the report was intended to be a preliminary or final report. While many of these questions were answered later in the report through comments or in response to community member questions, community members should be able to expect and rely upon a consistent reporting format.

  • The security standards enforced through the CA/Browser Forum are not guidelines but requirements intended to ensure the web's safety and trustworthiness. The argument that customer impact justifies deferring or delaying necessary security measures and agreed-upon practices overlooks the fundamental purpose of these standards. Instead, CA Owners must prepare for and mitigate such impacts in advance through proper planning and communication, and they must maintain capabilities to act immediately when necessary.

  • CAs are trusted based on their commitment to following the BRs and root program policies. This means the CA Owner, not its customers, needs to design its systems to support and ensure compliance. Entrust should have prepared for this situation and designed its systems accordingly so that each customer could promptly replace their certificates if necessary.

  • As a publicly trusted CA Owner, Entrust is responsible for the collective trust of internet users, not just your direct customers. Failure to take corrective action based on well-agreed-upon expectations and practices undermines the security and trustworthiness of the web as a whole.

  • This incident and the described impact of certificate revocation were entirely avoidable.

  • Entrust’s failure to cease issuance and correct its issuance profile(s) upon learning of this incident demonstrates questionable judgment which has amplified the issue's negative impact on its customers and their relying parties.

Updates Requested:

A. The Timeline Section does not describe when Entrust provided a preliminary report on its findings to the affected Subscribers and the entity that filed the report, as required by Section 4.9.5 of the TLS Baseline Requirements (below). As the reporter of this issue, I did not receive what I would consider a preliminary report within 24 hours of my submission. It’s also unclear whether Entrust contacted the subscribers affected by this incident. A separate bug should be opened due to failing to respond to a Certificate Problem Report in a complete and timely manner. Given these concerns, describing Entrust’s response to the third-party problem report as “what went well” feels like a mischaracterization.

TLS BR Section 4.9.5: Within 24 hours after receiving a Certificate Problem Report, the CA SHALL investigate the facts and circumstances related to a Certificate Problem Report and provide a preliminary report on its findings to both the Subscriber and the entity who filed the Certificate Problem Report. After reviewing the facts and circumstances, the CA SHALL work with the Subscriber and any entity reporting the Certificate Problem Report or other revocation-related notice to establish whether or not the certificate will be revoked, and if so, a date which the CA will revoke the certificate. The period from receipt of the Certificate Problem Report or revocation-related notice to published revocation MUST NOT exceed the time frame set forth in Section 4.9.1.1.

B. The Root Cause Analysis Section fails to identify this issue's actual root cause. I interpret your response to emphasize that a “discrepancy” within the CA/Browser Forum policies was largely responsible for this incident. Given that the policy documents are not responsible for updating Entrust’s CP/CPS or configuring and testing Entrust’s issuance systems in a manner that’s inconsistent with the EVGs, the emphasis on the discrepancy should be reconsidered. While it may have contributed to the incident, it is not a root cause. You could try the “5 Whys” methodology observed in 1878106.

C. The “Action Items” list does not directly address elements of the “What didn’t go well” list. It’s not clear how Entrust will ensure future profile changes are reviewed in the proper context or how all types of certificates will be evaluated successfully using linting tools. It’s also unclear how the “BR of BRs,” which, as I understand it, will still result in the generation of individual BR policy documents for a particular use case (like TLS), will prevent future incidents. Providing more details, including methods for the community to quantify whether each of the remediation tactics was successful, would be helpful.

D. The “Appendix” fails to list the affected certificates. Guidance on CCADB.org states, “In particular, in the case of incidents which directly impacted certificates, the Appendix must include a listing of the complete certificate details of all affected certificates.” This was also requested in Comment 1. We expect you to provide the list of all affected certificates. For as long as Entrust continues to misissue certificates related to this specific issue, we expect this list to be updated in a timely manner (i.e., minimally updated weekly).

Questions:

Q1) Section 4.9.1.1 of the Baseline Requirements unambiguously specifies the expectation for revocation in cases such as this, yet this narrative was excluded from Entrust’s initial incident report. Revocation was only discussed after being described as required in Comment 3. Can you share why this was omitted?

Q2) Root programs have made it clear several times that impacting customers is an unacceptable justification for a failure to act, especially considering the prescribed responses defined in the BRs. Please share why you believe this is now appropriate.

Q3) Why did Entrust decide it was in its customers' best interest to continue misissuing certificates after being notified of this issue rather than stopping issuance and correcting its issuance profile(s)?

Q4) Why did Entrust prioritize updating the Baseline Requirement EV Guidelines ahead of changing its issuance practices when the expected result of the BR update would have no bearing on the existing misissued certificates?

Q5) Can you describe whether Entrust uses pre-issuance linting and why that was not considered part of the remediation of this incident? Relying solely on post-issuance linting seems to leave an opportunity for future incidents, whereas pre-issuance linting presents an opportunity to prevent them.

Q6) Can you describe how Entrust evaluates linting tools to fully comprehend each one's scope, capabilities, and limitations, including as updates are made available?

Q7) Can you describe how Entrust validates linting tools are working as expected?

Q8) How does Entrust’s handling of this incident better prepare the affected subscribers to respond to potential security events in the future, such as key compromises?

Q9) What specific steps will Entrust take with these affected customers so that they can recover from this type of incident in the future in a manner that’s fully consistent with the BRs' expectations, and on what timeline?

Q10) Given the actions described in this bug, why should the community expect Entrust to behave differently in the future (i.e., in a manner that’s fully consistent with the BRs) if a similar incident were to be repeated?

Q11) Can you describe how something like “ACME auto discovery” is intended to improve agility and response to future incidents that necessitate certificate re-issuance, given that the affected customers have not adopted automation today? Given the wide availability of existing automation capabilities across the ecosystem, why should we expect auto discovery to make a difference for the affected customers?

Flags: needinfo?(paul.vanbrouwershaven)

We have has stopped issuing miss-issued certificates and fixed the EV certificate profile.
All impacted customers will be advised that their certificates will be revoked.
We will create a delayed revocation bug and will follow up on other questions in the next few days.

Flags: needinfo?(paul.vanbrouwershaven)

So you managed to stop the mis-issuance in less than 10 hours, but only once Google intervene? Before then, you simply didn't want to stop?

We are still owed a list of all affected certificates, here on this bug.

Trust in Entrust should be removed.

Please find attached a list of the 24246 affected certificates.

Can Entrust explain why their list of affected certificates contains only 24,246 entries, when an external query over publicly-available data shows at least 25,086 entries? (And that's an underestimate, not including misissuance from other Entrust-operated CAs.)

Specifically, the list provided by Entrust does not appear to contain an entry for this certificate, among others: https://crt.sh/?sha256=00057D388D18A03A189536323534719C968965EAD0F2A9FCED2D40EA765CDB1F

We have identified this same issue and are currently waiting on an updated report that includes revoked and expired certificates.

Please find attached the corrected list of the 26641 affected certificates, which includes all revoked and expired certificates.

A question for the timeline reporting:

Please let us also know exactly when in the timeline of this incident, you started creating a list of affected certificates.

The original list of impacted certificates was requested on 2024-03-04, as noted in the timeline:

10:42 Mis-issuance confirmed, requested a report of all impacted TLS EV certificates

When uploading the updated list of certificates here, we used the wrong report, containing only the certificates that need to be revoked, instead of the report containing both expired and revoked certificates. This also explains the lower number of certificates mentioned in comment 23 compared to comment 4.

We have just identified another issue with the list in comment 25: a few certificates that were included earlier in comment 22 are missing from this report. I have requested a corrected report and upload it here once it's available

By my count the updated file contains only 26,626 affected certificates, as 15 of the rows contain only the content "NULL".

It also does not appear to include any of the following fingerprints, which represent certificates issued by Entrust EU and AffirmTrust (some of these do not appear in crt.sh, but many of them do and all of them appear in censys):
https://crt.sh/?sha256=024B9CB2D9801ADBEE0A0BDDE9C422A285A456D0533B6E6E143A3B4A0B863DFD
https://crt.sh/?sha256=068FED5B8DC5073B5954744F009E2624C9F7B6A49D9C258218C9451D9384604A
https://crt.sh/?sha256=0BD9972702A33E7D5C6DF844012E9C02FBA363507DE384A32D66341E6CBA43DA
https://crt.sh/?sha256=443AA0807FE2E2F9E8E4B2394097157D3E5205B86FB060289F58C608E176D698
https://crt.sh/?sha256=50A9E4ED90AD042CB4DF414E42F6A7638188E4E29D6522FFBC31856031F02273
https://crt.sh/?sha256=6808F36B917F05540A74B2C764E243066DE3CAA7ECF6D31E232F7ED390079645
https://crt.sh/?sha256=7D347FF525775C80DD520A1BE91A8DF145113689B3D7063319A4B1A8C31A570B
https://crt.sh/?sha256=8478B7F7B471B9E4BF5B1863C88408FBC74DBD03FE84D5B1CA7225CD5140486F
https://crt.sh/?sha256=A6745632C7400C8B9AB84E6B6EC7C560FDEC36534FFDC664B311D5F917AF72AB
https://crt.sh/?sha256=ABFF3E12F86CA91A64D911D475B2E672C09ED64BC86BBDC2512304FF82310042
https://crt.sh/?sha256=B3D901072F4595311DCCB3B74D055BFF6BBAB355213ECB223F50C30513DAEB36
https://crt.sh/?sha256=DA5783D2E7ECAE529F18F035171D3828205AA1885987379A2C31C24087032EC3
https://crt.sh/?sha256=F48F8230A387467AB4AC89F1540853FFD38D244E670C56BD552138487A10083F
https://crt.sh/?sha256=F51DFE415240820F93E1D0AA9BB8F5E6F97F48309AB6EF15BFE1BE87BAADC242
https://crt.sh/?sha256=1C8121586767CE23BE7A1C1310B358E9BEE1952DBFD4E57E0E702135518DF79D
https://crt.sh/?sha256=1E4FDC0064DA51F392E9D8C16ADE85071C5649588DD0B357A204C777B1D40546
https://crt.sh/?sha256=2FE9131F1AED2DE3FCDE03145F76D15954EED788A9E5CF844EEA4B4F8B057F09
https://crt.sh/?sha256=383FBECF727C28966CD8237BA88D3C2C0C07C6867DAED0EF81EEAA2079346762
https://crt.sh/?sha256=701F12119DC0F05B128552296DEDB114771116DFCB4B564945FD25B801A185C7
https://crt.sh/?sha256=7727876931378E1EE13624741F080217F12CD78E73C2DE0964A6675744F89D28
https://crt.sh/?sha256=7A2160C1B46040BA7E3D42C833C1BB5B99A13BE7EDA5B0487E082FAC6DE69C60
https://crt.sh/?sha256=84C04B308536B4478773079837E148AE4F08AA81D46CFD99F9E39B53FB8FF9EA
https://crt.sh/?sha256=87620795E308F791B598B357F88CEFB7DCF21777FBD35A49924C32B6DF411ECB
https://crt.sh/?sha256=9A119421250819DD7705B6BDE94F00C581856CD24B0E47E2D8A01D68F459F4EE
https://crt.sh/?sha256=B0BAEE28FE83DEADCA7EF301F9D26A8709941EEC0FB27F4C019323B49907506B
https://crt.sh/?sha256=B2B94BF75247F6DE1FAE7F2BC586F9E60D978A14E99F93A0CD864C2F252E7CB9
https://crt.sh/?sha256=B4839C0988E8BE2A2B1144D8A24427AAB338B73C23C9090B97264FC73E078B41
https://crt.sh/?sha256=BF5B8A98CF51919347010FF8304C7557F40A2298D5BF2DB0ECEAFAB0C630C048
https://crt.sh/?sha256=C6A5FA13D47EDB1A3E891520B3395AFD63FE9E738B050F34ADEE933AE8509876
https://crt.sh/?sha256=D5782188A27CF6E4E4EF71AFFF5CEE8683624B360D9E91C8ADB87EB40C5274A3
https://crt.sh/?sha256=D7B7B2032790517D3F93848D0BB431650E17B36051169DF44EEBFDEE2DCC2A19
https://crt.sh/?sha256=D81395DE565D02B1ACC5A61785A321B439D572769BFD2197A01747A499E3B61C
https://crt.sh/?sha256=E12BA3FCA858333FD9D5BE50504544F9BDCB5F9A7BBEDA279C5FFF963AFDE9CF
https://crt.sh/?sha256=E669CF805C9355F0F648CD1B66259159B666A53A3F2E97961B47FD511F8DF576
https://crt.sh/?sha256=F6226F8320553BF5F8DD96A200BC6F621426301081159EBB4D0B7350B23AD7EE
https://crt.sh/?sha256=FD86924D75FC6CE693575C443770A132D1D217DD764ABAFD0F7D793DFA68FC1A

edit: I see Paul entered a comment at the same time as me. I will note that the fingerprints I have provided in this comment did not appear in the first list of affected certificates either.

Paul, are your reports expected to include the SHA-256 fingerprints of all affected Certificates, or all affected Precertificates, or both, or a mixture of some sort? For example, the Precertificate https://crt.sh/?id=12424942307 is included in both of your attachments, but the corresponding Certificate https://crt.sh/?id=12430396544 is absent from both attachments; whereas the CCADB incident reporting requirement is that "the Appendix must include a listing of the complete certificate details of all affected certificates".

The 15 NULL links included in the report are related to clientAuth only certificates, which are not logged to CT or impacted by this incident.

We are submitting all impacted (final) certificates to CT but this might take some time.

We will provide an updated report with links to the final certificates when this submission has been completed.

Please find attached the corrected list with links to the final leaf certificates of the 26653 affected certificates.

Attachment #9391981 - Attachment is obsolete: true
Attachment #9392051 - Attachment is obsolete: true

Please find attached the corrected list with links to the pre-certificates of the 26653 affected certificates.

(In reply to Paul van Brouwershaven from comment #30)

The 15 NULL links included in the report are related to clientAuth only certificates, which are not logged to CT or impacted by this incident.

I would like to correct this statement above, these clientAuth only EV TLS certificates are impacted by this incident and will also be revoked. In addition they are non-compliant because they do not contain the serverAuth EKU, another incident report has been submitted for that particular problem: https://bugzilla.mozilla.org/show_bug.cgi?id=1886467

https://crt.sh/?sha256=83921F1D8C35B72B0803FD0244305588FB28F1B769FC3E639A796775DF417F9A
https://crt.sh/?sha256=37D8480DBB941AEB8BA8E1407C2064B122D596A36A515031AFCC30D4E7B40E50
https://crt.sh/?sha256=80CDF6BB358DC9BE82B58B873F00C3A7D7621C9EADC940F836DACDB997993A19
https://crt.sh/?sha256=27527E1049DB2AB48D4C5B0A75C42A76497ECA2463EFD3C6861E6D7936B50686
https://crt.sh/?sha256=3D4F2E42946A3B91A3FFCFDE0E1D8513C0AC37EE5B9835616812A13CA1D47C2E
https://crt.sh/?sha256=F498C181470EB3A72DB92D8014B342E744D9A138CD5E06F071D068AA554AE2FC
https://crt.sh/?sha256=0985AEC6103E81C8BDDBDA4C6EBB9071D1C3E02D621EAF0158B35D0256EA7E5A
https://crt.sh/?sha256=465EDDCB724C65F15FFED1FC223C24E902C61137DCD5E2A042CFBDDE4200C9F0
https://crt.sh/?sha256=DE06283E78C09C831B5A20A8D0C178F7C3A811B2EFA3187ABFA198BD60B04756
https://crt.sh/?sha256=FAAA190D0701F6DB4D912AED7E0A1282FD547FB9E03C3A7D180742AD2AD52268
https://crt.sh/?sha256=E53B6326D4DB08D1D77358399E94F51EDA2D6EEDDE8B5029137787B1A34EC4C9
https://crt.sh/?sha256=9F00010D4B5141BD83B4E0BC2352E30C10E6DF67FA70AEA92DF377D3AE9A47C4
https://crt.sh/?sha256=199CE56803F91DC1628AE94C700AD603A4C1A70D3F91A00B3F6A047ADB7A094D
https://crt.sh/?sha256=541A7EDEF98E81DDBCE84E2E1F477B90E18C309024E40FB91E4F0C76C79B5006
https://crt.sh/?sha256=5CAFCAE9BDED88457FA85FB77BDD7182C6A79DB36BFF51CF8FC8725F83432436
See Also: → 1887753

This is just an update to let you know that, as we work on this incident, we will come back to the community with thoughtful answers to the questions in this bug and an updated incident report.

(In reply to Ryan Dickson from comment #19)

The Chrome Root Program Policy describes an expectation that CA Owners included in the Chrome Root Store adhere to the CA/Browser Forum Baseline Requirements. Beyond this, we expect CA Owners to be accountable for their actions and prioritize remediation when notified of something that hasn’t gone right.

Entrust's initial reporting of this incident and subsequent response fails to meet our expectations.

While the way we initially reported and responded to this incident as well as our initial approach may not have aligned with the community’s established protocols and expectations, our intention was and remains to act in the best interests of those in the web ecosystem, aiming to appropriately balance compliance with practicality and to avoid undue disruption.

We are committed to learning from this experience and to contributing positively to the security and trustworthiness of the web PKI ecosystem. We plan to take the necessary steps to ensure our future actions reflect not just our commitment but also an enhanced understanding of collective standards and expectations.

An updated incident report will be provided by April 12, 2024.


General Comments:

  • As with all bugs, we expect Entrust to respond to comments and questions presented by other community members in this bug (described on CCADB.org).

Understood and acknowledged.


  • When submitting future incident reports, please be sure to closely follow the incident reporting guidelines available on CCADB.org. For example, the initial report fails to describe the number of affected certificates related to the incident or whether Entrust ceased issuance. It also does not describe whether the report was intended to be a preliminary or final report. While many of these questions were answered later in the report through comments or in response to community member questions, community members should be able to expect and rely upon a consistent reporting format.

In the future we will ensure it is clear when we are posting a preliminary report and include an estimated timeframe for posting a complete incident report in the expected format.


  • The security standards enforced through the CA/Browser Forum are not guidelines but requirements intended to ensure the web's safety and trustworthiness. The argument that customer impact justifies deferring or delaying necessary security measures and agreed-upon practices overlooks the fundamental purpose of these standards. Instead, CA Owners must prepare for and mitigate such impacts in advance through proper planning and communication, and they must maintain capabilities to act immediately when necessary.

Please see our answer to question 2 below.


  • CAs are trusted based on their commitment to following the BRs and root program policies. This means the CA Owner, not its customers, needs to design its systems to support and ensure compliance. Entrust should have prepared for this situation and designed its systems accordingly so that each customer could promptly replace their certificates if necessary.

Our commitment is to follow the BRs and root program policies, and our systems are designed to support this effort. Our systems are not a barrier to our customers replacing and revoking these certificates in a timely manner. However, the incident has put a spotlight on key customer challenges and given us further ideas for improvement for both ourselves and our customers.

The typical Entrust customer runs in a complex, regulated, or customized environment. Automation of the certificate lifecycle in these environments is a dedicated project that often doesn’t cover the lifecycle of all systems and services.


  • As a publicly trusted CA Owner, Entrust is responsible for the collective trust of internet users, not just your direct customers. Failure to take corrective action based on well-agreed-upon expectations and practices undermines the security and trustworthiness of the web as a whole.

We recognize our responsibility to the internet users and our actions were focused on preventing disruption, as these certificates did not pose any security risk to the internet users and we initially believed that the incident was the result of an error in the EV guidelines.

We also acknowledge that, once alerted, and having confirmed the error, we should have stopped issuing new EV certificates that lacked the cPSuri and initiated discussion of the guidelines afterwards.

We cover this in more detail in the responses below.


  • This incident and the described impact of certificate revocation were entirely avoidable.

We believe incident could have been avoided. We are taking measures to prevent a similar incident from happening again, including implementation of additional layers within our systems, and processes designed specifically to detect and address such errors.

By implementing these additional safeguards, we will continue to work towards the highest standards of operation and ensure robust oversight of that operation.

We cover this in more detail in the responses below.


  • Entrust’s failure to cease issuance and correct its issuance profile(s) upon learning of this incident demonstrates questionable judgment which has amplified the issue's negative impact on its customers and their relying parties.

Based on the CA/Browser Forum discussions that informed the September 2023 update (SC-62v2), we truly believed that this “mis-issuance” was due to an error or oversight in the EV Guidelines.

We acknowledge that, once alerted, and having confirmed the error, we should have more quickly stopped issuing new EV certificates that lacked the cPSuri. Again, we believed this was due to an error or oversight in the EV Guidelines. In the future, we will cease certificate issuance pending discussion and resolution of any perceived discrepancies in the guidelines.


Updates Requested:

A. The Timeline Section does not describe when Entrust provided a preliminary report on its findings to the affected Subscribers and the entity that filed the report, as required by Section 4.9.5 of the TLS Baseline Requirements (below). As the reporter of this issue, I did not receive what I would consider a preliminary report within 24 hours of my submission. It’s also unclear whether Entrust contacted the subscribers affected by this incident. A separate bug should be opened due to failing to respond to a Certificate Problem Report in a complete and timely manner. Given these concerns, describing Entrust’s response to the third-party problem report as “what went well” feels like a mischaracterization.

TLS BR Section 4.9.5: Within 24 hours after receiving a Certificate Problem Report, the CA SHALL investigate the facts and circumstances related to a Certificate Problem Report and provide a preliminary report on its findings to both the Subscriber and the entity who filed the Certificate Problem Report. After reviewing the facts and circumstances, the CA SHALL work with the Subscriber and any entity reporting the Certificate Problem Report or other revocation-related notice to establish whether or not the certificate will be revoked, and if so, a date which the CA will revoke the certificate. The period from receipt of the Certificate Problem Report or revocation-related notice to published revocation MUST NOT exceed the time frame set forth in Section 4.9.1.1.

We acknowledge that we did not provide a preliminary incident report to the subscribers and the entity who filed the certificate problem report, as required by the TLS Baseline Requirements section 4.9.5.

We have submitted a preliminary incident report for this which can be found here https://bugzilla.mozilla.org/show_bug.cgi?id=1890123

We promptly contacted subscribers affected by this incident once we decided to revoke all impacted certificates.


B. The Root Cause Analysis Section fails to identify this issue's actual root cause. I interpret your response to emphasize that a “discrepancy” within the CA/Browser Forum policies was largely responsible for this incident. Given that the policy documents are not responsible for updating Entrust’s CP/CPS or configuring and testing Entrust’s issuance systems in a manner that’s inconsistent with the EVGs, the emphasis on the discrepancy should be reconsidered. While it may have contributed to the incident, it is not a root cause. You could try the “5 Whys” methodology observed in 1878106.

Using the 5 Whys methodology below, we conclude that this incident is caused by the following:

1. Why was there a problem?
Because we removed the cPSuri in the policy qualifiers of our EV certificates, believing this to be in compliance with the changes in SC-62v2.

2. Why was the cPSuri removed?
Because the changes in SC-62v2 explicitly call out the certificate profile requirements for Extended Validation certificates, which state that policy qualifiers are not recommended.

3. Why did we not check the Extended Validation Guidelines?
TLS BRs section 7.1.2.7.5 (Subscriber Certificate Types) points to section 7.1.2.7.5 (Extended Validation) for the Extended Validation (EV) certificate type. This section says that the certificate must comply with the current version of the certificate profile in the Extended Validation Guidelines, and follows with specific instructions for the certificate subject, certificate policies, and all other extensions.

While section 7.1.2.7.5 points to the EVGs in the introduction, it follows with references for the certificate subject and all other extensions with explicitly call outs, but for the certificate policies it points to section 7.1.2.7.9 (Subscriber Certificate Policies) of the TLS BRs, which states that the policy qualifiers are not recommended.

4. Why did we not verify this with the Extended Validation Guidelines?
Based on our participation in standards discussions, we understood that removal of policy qualifiers was an agreed change. The team that drafted and verified this change to the certificate profiles and CPS discussed it in the validation subcommittee of the CA/Browser Forum and aligned on the position that the policy qualifier increases the size of the certificate without providing any value to a typical relying party, and the information may be obtained by other means when necessary.

We did not consider this position to be applicable only to the TLS Baseline Requirements, so we did not consider that there could be conflicting language in the Extended Validation Guidelines.

5. Why did we not detect this issue earlier?
The pre-issuance and post-issuance linters that we used did not detect this issue.


C. The “Action Items” list does not directly address elements of the “What didn’t go well” list. It’s not clear how Entrust will ensure future profile changes are reviewed in the proper context or how all types of certificates will be evaluated successfully using linting tools. It’s also unclear how the “BR of BRs,” which, as I understand it, will still result in the generation of individual BR policy documents for a particular use case (like TLS), will prevent future incidents. Providing more details, including methods for the community to quantify whether each of the remediation tactics was successful, would be helpful.

Based on the root cause discussion above, we are addressing two areas:

We included an action item to deploy pkilint as a new post-issuance linter in addition to our existing pre-issuance and post-issuance linters. However, we could have included an action item to ensure we are reviewing changes in the right context. We will update the list of action items in the updated incident report.

We can add that going forward, any CP/CPS and certificate profile changes will be reviewed by someone not involved in the discussion and drafting of those standards.

The BRs of BRs is intended to streamline and harmonize the existing baseline requirements documents within the CA/Browser Forum. The goal of this streamlining and harmonizing is to reduce duplication and enhance clarity by establishing a unified set of baseline requirements applicable to various certificate use cases.

For this purpose, the different documents need to strictly adhere to RFC 3647 and ideally non-standard subsections need to align between standards. The current interface is supposed to support this harmonization effort, see this presentation that was given at the face-to-face meeting in Delhi earlier this year.


D. The “Appendix” fails to list the affected certificates. Guidance on CCADB.org states, “In particular, in the case of incidents which directly impacted certificates, the Appendix must include a listing of the complete certificate details of all affected certificates.” This was also requested in Comment 1. We expect you to provide the list of all affected certificates. For as long as Entrust continues to misissue certificates related to this specific issue, we expect this list to be updated in a timely manner (i.e., minimally updated weekly).

Acknowledged and understood. The list of affected certificates has since been added to this bug.


Questions:

Q1) Section 4.9.1.1 of the Baseline Requirements unambiguously specifies the expectation for revocation in cases such as this, yet this narrative was excluded from Entrust’s initial incident report. Revocation was only discussed after being described as required in Comment 3. Can you share why this was omitted?

We initially omitted the revocation narrative from our incident report because we believed that the missing "certificatePolicies:policyQualifiers:qualifier:cPSuri” in the issued EV certificates was due to an error in the EV Guidelines. Our intent was not to bypass the expectations in the Baseline Requirements but instead to align with the agreed direction (removing the CPS from TLS certificates) and determine the correct interpretation of the Baseline Requirements versus the EV Guidelines.

In retrospect, we acknowledge that we should have stopped issuance of EV certificates without the policy qualifier upon confirmation of the issue, and then followed up to pursue what we saw as a possible oversight in the EV Guidelines.

We also could have clearly communicated in the incident report the decision and reasoning behind our initial decision not to proceed with revocation. This lack of clarity omitted a critical aspect of our response strategy, which may have led to misunderstandings about our intentions and commitment to the integrity of the web PKI ecosystem.

We are committed to ensuring that our communications, especially in incident reports, fully outline our decision-making processes and actions to provide the community with a complete picture of our response to incidents.


Q2) Root programs have made it clear several times that impacting customers is an unacceptable justification for a failure to act, especially considering the prescribed responses defined in the BRs. Please share why you believe this is now appropriate.

We believed that there was an unintended error in the EV Guidelines, and that this was a unique situation. This led us to the conclusion that it was inappropriate to initiate revocation as this would also be highly disruptive to our subscribers that often operate critical services.

For broader discussion, to maintain the integrity of the ecosystem, it is imperative that subscriber organizations adopt more agile practices in the face of rising threats. However, the reality today is that many large organizations that are critical to the web ecosystem may not be prepared with the capacity for rapid certificate rotation in a non-emergency situation. Our focus is on working with all stakeholders in the ecosystem to build these capabilities in a thoughtful, deliberate manner. We believe that through collaboration and dialogue, we can prepare for the future in a way that safeguards the security, stability, and trustworthiness of the web PKI ecosystem, making it resilient against the inevitable challenges to come. These solutions should support rapid response without triggering undue risks to critical services.

We discuss ways to address these issues in our response to Q11.


Q3) Why did Entrust decide it was in its customers' best interest to continue misissuing certificates after being notified of this issue rather than stopping issuance and correcting its issuance profile(s)?

As noted above, we acknowledge that we should have stopped issuance of EV certificates without the policy modifier upon confirmation of the issue before pursuing a discussion on a possible oversight in the EV Guidelines, but we believed there was an unintended discrepancy between the guidelines, and we were simultaneously trying to avoid disruption to the web ecosystem and resolve the discrepancy with the CA/Browser Forum.

Our intent was not to bypass the guidelines. We believed that it was important to determine the correct interpretation of the TLS BR and the EV Guidelines in a way that was aligned with what we believe was the agreed direction of the SC-62v2 updates to the TLS BR (removing the CPS from TLS certificates).


Q4) Why did Entrust prioritize updating the Baseline Requirement EV Guidelines ahead of changing its issuance practices when the expected result of the BR update would have no bearing on the existing misissued certificates?

We understand that updates to the requirements have no retroactive effect on existing certificates, we wanted to realign and clarify standards moving forward. We saw this as a constructive step toward resolving the discrepancy and ensuring future compliance in line with an agreed direction.


Q5) Can you describe whether Entrust uses pre-issuance linting and why that was not considered part of the remediation of this incident? Relying solely on post-issuance linting seems to leave an opportunity for future incidents, whereas pre-issuance linting presents an opportunity to prevent them.

Entrust does employ pre-issuance linting, primarily leveraging zlint, which is well-suited for this stage due to its balance of thoroughness and efficiency. This pre-issuance step is a critical part of our quality assurance process, aimed at identifying and preventing potential issues before certificates are issued.

We use certlint, zlint and an Entrust inhouse linter for post-issuance linting. In this case, the third-party linters that Entrust uses did not catch this discrepancy, and the internal Entrust linter was propagated with the same change as the certificate profiles.


Q6) Can you describe how Entrust evaluates linting tools to fully comprehend each one's scope, capabilities, and limitations, including as updates are made available?

We review changes before deploying new linters or updates to our existing linting tools. The review includes analyzing the updates, such as lints and unit tests, and potential impact on certificate issuance processes.

We participate in the development and improvement of linting tools by contributing lints based on our experiences and observations in certificate issuance practices. Our collaboration with the broader community helps improve these tools for all users.

Ideally every requirement is explicitly spelled out in a lintable language, with BRs of BRs (see links below) used to automatically generate a requirements matrix. This matrix can be loaded in a GRC system but can also be used to better manage the linting coverage.


Q7) Can you describe how Entrust validates linting tools are working as expected?

To validate that our linting tools are operating as expected, Entrust has established a daily testing protocol. Each day, we issue a certificate from a private CA. This certificate is specifically designed to fail the linter.

This deliberate approach is not foolproof as it only covers a particular known scenario. We also realize that linters are open-source tools, that are developed and advanced on a voluntary basis and that they rely on community support. They will not cover all potential issues, but we have and will continue to contribute to linters to improve their coverage.


Q8) How does Entrust’s handling of this incident better prepare the affected subscribers to respond to potential security events in the future, such as key compromises?

Our eventual move towards revocation presented a real-world scenario for our subscribers, underscoring the importance of being prepared for emergency situations such as key compromises or other events requiring rapid certificate replacement. This non-security incident distressed subscribers, not all of whom were prepared for rapid certificate replacement.

A structured and industry-wide approach would bolster preparedness across the ecosystem. An initiative like a regularly scheduled "fire drill" for revocation events could serve as a controlled method to enhance readiness among all stakeholders. Initially conducted as announced exercises, and potentially transitioning to unannounced drills in the future, this framework would motivate organizations to develop and refine their security threat response strategies.

Such drills could mimic the pressures and challenges of real revocation events without the immediate risks, providing valuable insights into organizational vulnerabilities, systems, protocols, regulations, and environments that require further preparation for a real security event. This in turn would encourage investments in automation, streamlined communication channels with CAs, and internal processes that can adapt quickly to change.


Q9) What specific steps will Entrust take with these affected customers so that they can recover from this type of incident in the future in a manner that’s fully consistent with the BRs' expectations, and on what timeline?

Entrust is committed to enhancing the resilience and agility of our customers in the face of incidents that require rapid certificate management responses. Recognizing the critical importance of preparedness, we are actively working to prepare our customers to quickly and effectively respond and recover in the face of a mass revocation incident. Our approach includes:

1. Education on Agility: We emphasize the necessity for agility in managing digital certificates through targeted educational efforts, including webinars, customer events, and direct communications. We underscore the significance of preparedness for swift certificate replacement or revocation scenarios. We also ask Security Officers to put this risk on their risk registers.

2. Support for Automation Protocols: Entrust supports several automation protocols, including ACME and others, to facilitate efficient and timely certificate lifecycle management. By leveraging these protocols, customers can automate many aspects of certificate issuance, renewal, and revocation, reducing the manual effort required and enabling faster response times.

3. Certificate Lifecycle Management (CLM) System: The Entrust CLM system and our partners CLM systems provide integrated platforms for comprehensive certificate management. These systems allow customers to monitor their certificate inventory actively, identify expirations or configurations that may require attention, and automate renewal processes where possible.

4. Addressing Ecosystem Challenges: We are actively working in the IETF to address key ecosystem challenges that impact a subscriber’s ability to implement automation with CAs of their choice, for example, configuring custom ACME endpoints and fallback servers within Cloud Service Providers (CSPs).

Our timeline for these activities is ongoing; with education and awareness campaigns on an ongoing basis, support the advancement of CLM systems to address customer needs and technological advancements, and we are involved in collaborative initiatives that help advance our industry and the security of the web ecosystem. In addition, we are exploring the idea of an industry wide fire drill as explained in question 8.


Q10) Given the actions described in this bug, why should the community expect Entrust to behave differently in the future (i.e., in a manner that’s fully consistent with the BRs) if a similar incident were to be repeated?

We are committed to ensuring that our actions are in full compliance with the requirements. Entrust's commitment to acting in full compliance with the BRs while upholding ecosystem integrity remains steadfast. Our response strategy will continue to reflect our dedication to secure internet practices, informed by constructive dialogue within the industry.

Reflecting on this incident and its feedback, we acknowledge the need to prioritize adherence to standards and BRs before advocating for changes or corrections to those standards. And we will communicate with greater clarity to ensure that our responses meet community guidelines and advance the knowledge base of the ecosystem.


Q11) Can you describe how something like “ACME auto discovery” is intended to improve agility and response to future incidents that necessitate certificate re-issuance, given that the affected customers have not adopted automation today? Given the wide availability of existing automation capabilities across the ecosystem, why should we expect auto discovery to make a difference for the affected customers?

The concept of "ACME auto discovery" is designed to enhance the agility and resilience of the ecosystem, particularly in scenarios requiring rapid certificate re-issuance. This initiative addresses a crucial gap in deployment of automation technologies for certificate management. Specifically, the constraints imposed by hosting or cloud service providers when subscribers try to use custom ACME servers or external account bindings.

Most providers today default to DV certificates from Let's Encrypt via ACME, but do not offer users flexibility in configuring a custom ACME endpoint or an external account binding. This limitation restricts subscribers' ability to leverage other Certificate Authorities (CAs) that they have contracted.

ACME auto discovery aims to democratize access to automated certificate lifecycle management by making it easier for software and systems to dynamically discover and integrate with any CA's ACME server. By facilitating a more open and versatile approach, subscribers can choose the CAs that best meet their specific requirements. This flexibility is critical if we want subscribers to move to automated solutions and avoid crucial dependencies on one or a few CAs in the ecosystem.

Please see this presentation we made to the CA/Browser Forum F2F #59 in Redmond, WA, USA:
https://cabforum.org/uploads/F2F-59-CABF-SCWG-ACME-Automation.pdf

Please provide an answer to the question posed in comment 26 too.

(In reply to amir from comment #36)

Please provide an answer to the question posed in comment 26 too.

I believe this was addressed in comment 27. Please let us know if that doesn't sufficiently answer your question.

Additionally, please be aware that we are currently preparing an updated incident report that will also include more details in the timeline, which will be available by April 12, 2024.

Please note that we missed the Due Date of the following action item, we will come with a new date on or before April 19.

Action Item Kind Due Date
Deployment of pkilint as post-issuance linter in addition to existing linters Detect TBC

Thanks, I'm interested in a couple of statements you've made.

It seems like there were two mistakes in creating the list of impacted certificates, not a huge deal - getting these lists can sometimes be difficult. However, it does put into question a statement that was made in comment 4

It seems like Entrust made the determination that this misissuance isn't really a concern, and just a misalignment of the rules. However, I'm not sure if I'm able to really see the data you used to make that determination here.

Looking at comment 35, you've made this statement:

In retrospect, we acknowledge that we should have stopped issuance of EV certificates without the policy qualifier upon confirmation of the issue, and then followed up to pursue what we saw as a possible oversight in the EV Guidelines.

I'm glad that you've admitted that what you did wasn't okay, but at the time that you made that decision - why did you think this was okay? Can you please point to any other bugzilla incident where a situation where the CA was told they were misissuing certificates, and the CA said we're just going to keep misissuing certificates was accepted by the compliance regime?

My concrete questions are:

  1. Was the decision you made there to continue misissuing based on:

    1. A previous incident response from Entrust, or another CA?
    2. A root program telling you it is okay to continue doing this?
    3. Something else?
  2. Was that initial incident communication by Entrust evaluated by at least 2 people at Entrust before posting here to "check each other's work and understanding" type deal? If so, was this point raised in those discussions?

  3. Did you reach out to your auditor(s), or any WebTrust certified auditor when making this decision? If so, please share with the community what guidance they gave you?

  4. Did you reach out to any root program members at all before making the decision to continue misissuance? If so, which root programs, and what was the guidance they gave you?

We understand that updates to the requirements have no retroactive effect on existing certificates, we wanted to realign and clarify standards moving forward. We saw this as a constructive step toward resolving the discrepancy and ensuring future compliance in line with an agreed direction.

This statement is weird. If you knew it wasn't retroactive, why wouldn't you at the very least stop issuance. Please note: I am specifically not talking about revocation here. I'm wondering since you understood that this change won't be retroactive, why did you make the decision that it is okay to continue misissuing?

To validate that our linting tools are operating as expected, Entrust has established a daily testing protocol. Each day, we issue a certificate from a private CA. This certificate is specifically designed to fail the linter.

This deliberate approach is not foolproof as it only covers a particular known scenario. We also realize that linters are open-source tools, that are developed and advanced on a voluntary basis and that they rely on community support. They will not cover all potential issues, but we have and will continue to contribute to linters to improve their coverage.

So, am I reading this right that you effectively have a single negative test in your issuance pipeline? This seems like an action item that should be written out. Negative issuance tests are extremely useful to ensure regressions are caught, and that new deployments of the issuance software in a non-prod environment catch these before they end up having an impact on your prod environment.

A structured and industry-wide approach would bolster preparedness across the ecosystem.

These do exist. For example, embracing shorter lifetime in certificates has the impact of forcing automation on subscribers. Maybe Entrust, being an industry leader in this space, would like to pave the way for shorter lifetime certificates in the EV space? Especially since the Chrome Root Program will be requiring automation in some shape or form for new CAs going forward. I think Entrust can definitely use this as an opportunity to setup best practices for future CAs entering this space.

Entrust's commitment to acting in full compliance with the BRs while upholding ecosystem integrity remains steadfast.

I'm not entirely sure this is true considering that Entrust made a decision to not stop issuance until the certificate profile was fixed on their end. I don't see how that decision was in "full compliance with the BRs."


At the end of the day, root programs have so far been lax with CAs not deciding to revoke. I'm not going to rehash that discussion here, and I'd appreciate if the response to this comment also ignores the revocation issue.

My main concern here has been that, in my entire time being in this community, I have personally never seen a CA make the decision to continue misissuing certificates. So really the tl;dr of this comment is me asking for answers on the full timeline and factors involved in making that decision.

(In reply to amir from comment #39)

Thanks, I'm interested in a couple of statements you've made.

It seems like there were two mistakes in creating the list of impacted certificates, not a huge deal - getting these lists can sometimes be difficult. However, it does put into question a statement that was made in comment 4

Initially we did not include the revoked and expired certificates; this was due to us using the wrong list which was intended for our team to follow-up with subscribers. Later it was identified that 15 certificate were missing as these were clientAuth only; for this we posted a separate incident.

It seems like Entrust made the determination that this mis-issuance isn't really a concern, and just a misalignment of the rules. However, I'm not sure if I'm able to really see the data you used to make that determination here.

Looking at comment 35, you've made this statement:

In retrospect, we acknowledge that we should have stopped issuance of EV certificates without the policy qualifier upon confirmation of the issue, and then followed up to pursue what we saw as a possible oversight in the EV Guidelines.

I'm glad that you've admitted that what you did wasn't okay, but at the time that you made that decision - why did you think this was okay? Can you please point to any other bugzilla incident where a situation where the CA was told they were misissuing certificates, and the CA said we're just going to keep misissuing certificates was accepted by the compliance regime?

Please see our answers to Q1 in bug #1883843, comment #35.

My concrete questions are:

  1. Was the decision you made there to continue misissuing based on:
    1. A previous incident response from Entrust, or another CA?
    2. A root program telling you it is okay to continue doing this?
    3. Something else?

To our knowledge there has not been such an error in the requirements before.

  1. Was that initial incident communication by Entrust evaluated by at least 2 people at Entrust before posting here to "check each other's work and understanding" type deal? If so, was this point raised in those discussions?

We do review incident reports and postings to Bugzilla before actually posting them, the contents have been discussed and agreed.

  1. Did you reach out to your auditor(s), or any WebTrust certified auditor when making this decision? If so, please share with the community what guidance they gave you?

This is not the responsibility of the auditor, so we did not reached out to them, however, we have scheduled a meeting to discuss these incidents.

  1. Did you reach out to any root program members at all before making the decision to continue misissuance? If so, which root programs, and what was the guidance they gave you?

We did not reach out to any root programs before posting this incident on 2024-03-06, we have been in contact based on the feedback of the community.

We understand that updates to the requirements have no retroactive effect on existing certificates, we wanted to realign and clarify standards moving forward. We saw this as a constructive step toward resolving the discrepancy and ensuring future compliance in line with an agreed direction.

This statement is weird. If you knew it wasn't retroactive, why wouldn't you at the very least stop issuance. Please note: I am specifically not talking about revocation here. I'm wondering since you understood that this change won't be retroactive, why did you make the decision that it is okay to continue misissuing?

Please see our answers to Q3 in bug #1883843, comment #35.

To validate that our linting tools are operating as expected, Entrust has established a daily testing protocol. Each day, we issue a certificate from a private CA. This certificate is specifically designed to fail the linter.

This deliberate approach is not foolproof as it only covers a particular known scenario. We also realize that linters are open-source tools, that are developed and advanced on a voluntary basis and that they rely on community support. They will not cover all potential issues, but we have and will continue to contribute to linters to improve their coverage.

So, am I reading this right that you effectively have a single negative test in your issuance pipeline? This seems like an action item that should be written out. Negative issuance tests are extremely useful to ensure regressions are caught, and that new deployments of the issuance software in a non-prod environment catch these before they end up having an impact on your prod environment.

As stated, we do check if the linting tools are working, but it’s not realistic to check each individual lint. This is why we do look at the test coverage of linting tools and as described in the answer to Q6 in bug #1883843, comment #35. Which is another reason why we prefer zlint over pkilint as it requires positive and negative test for each lint and comes with integration tests with a large corpus of test certificates.

A structured and industry-wide approach would bolster preparedness across the ecosystem.

These do exist. For example, embracing shorter lifetime in certificates has the impact of forcing automation on subscribers. Maybe Entrust, being an industry leader in this space, would like to pave the way for shorter lifetime certificates in the EV space? Especially since the Chrome Root Program will be requiring automation in some shape or form for new CAs going forward. I think Entrust can definitely use this as an opportunity to setup best practices for future CAs entering this space.

Our OV and EV certificates are available through ACME, but the adoption of ACME by our subscriber base is low. As mentioned in several other comments, the typical Entrust customer runs in a complex, regulated, or customized environment. Automation of the certificate lifecycle in these environments is a dedicated project that often doesn’t cover the lifecycle of all systems and services. On the other hand, most providers today default to DV certificates from Let's Encrypt via ACME, but do not offer users flexibility in configuring a custom ACME endpoint or an external account binding. This limitation restricts subscribers' ability to leverage other Certificate Authorities (CAs) that they have contracted such as Entrust.

Updated Incident Report

Summary

This incident report describes the mis-issuance of EV certificates as result of certificate profile changes implemented to comply with changes introduced by Ballot SC-62v2.

Impact

A total of 26,668 EV certificates are impacted by this incident, which are all EV certificates issued by Entrust since the changes for Ballot SC-62v2 were implemented on September 11, 2023, and until we corrected the certificate profile on March 18, 2024.

Timeline

All times are UTC.

2023-03-31:

  • Drafted certificate profile updates for SC-62v2.

2023-04-03:

  • Certificate profile updates reviewed by a second person of the compliance team.

2023-09-11:

  • Certificate profile updates deployed to production and mis-issuance started.

2023-09-15:

  • SC-62v2 became effective.

2024-03-04:

  • 13:00 Our team received a report from Ryan Dickson (in a personal capacity) about 10+ certificates that seem to be missing the required certificatePolicies:policyQualifiers:qualifier:cPSuri.
  • 14:41 Requested investigation from Incident Review Team.

2024-03-05:

  • 10:42 Mis-issuance confirmed, requested a report of all impacted EV TLS certificates.
  • 11:30 Confirmed that both the Code Signing and S/MIME requirements do not require the cPSuri.

2024-03-06:

  • 08:35 Publication of the initial incident report.
  • 15:00 Revised conclusion that this is actually not a miss issuance, but rather an error in the TLS EVGs as we correctly followed the EV Certificate profile as defined by the TLS BRs, which is also in accordance with the strong desire of the browsers to remove the cPSuri “because this information increases the size of the Certificate without providing any value to a typical Relying Party, and the information may be obtained by other means when necessary”.

2024-03-13:

  • 16:28 Published initial draft of ballot SC-72 to align the TLS EVGs with the TLS BRs.
  • 18:05 Added Dimitris Zacharopoulos (HARICA) as first endorser of ballot SC-72.

2024-03-14:

  • 10:36 Updated SC-72 draft based on feedback.
  • 11:55 Added Iñigo Barreira (Sectigo) as second endorser of ballot SC-72.

2024-03-15:

  • 18:06 Mozilla asks to fix the certificate profile and to stop mis-issuance.
  • 21:49 Meeting with a browser vendor to discuss the incident.

2024-03-18:

  • 15:50 Meeting with browser vendor to discuss the incident.
  • 19:30 Meeting with browser vendor to discuss the incident.
  • 21:40 We stopped issuing miss-issued certificates and fixed the EV certificate profile.
  • 23:00 We made a report available to all customers listing certificates impacted.

2024-03-19:

  • 05:00 All impacted customers have been requested by email that their certificates will be revoked and that they need to replace these as soon as possible.
  • 11:27 A list of affected certificates was added to this bug.

2024-03-20:

  • 08:36 Uploaded updated list of affected certificates to this bug.
  • 14:42 Posted an incident report about clientAuth TLS Certificates without serverAuth EKU detected in the reporting for this incident, see bug #1886467.
  • 14:49 Added a list of the 15 missing clientAuth only EV TLS Certificates to this bug.
  • 17:22 Filed a delayed revocation incident report, see bug #1886532.

2024-04-01:

  • 12:00 Voting on ballot SC-72 passed, the ballot is pending IPR review period until May 3, 2024.

2024-04-06:

  • 12:00 Posted a preliminary incident report that we failed to provide a preliminary incident report according to TLS BR 4.9.5, see bug #1890123.

Root Cause Analysis

Using the 5 Whys methodology below, we conclude that this incident is caused by the following:

1. Why was there a problem?

Because we removed the cPSuri in the policy qualifiers of our EV certificates, believing this to be in compliance with the changes in SC-62v2.

2. Why was the cPSuri removed?

Because the changes in SC-62v2 explicitly call out the certificate profile requirements for Extended Validation certificates, which state that policy qualifiers are not recommended.

3. Why did we not check the Extended Validation Guidelines?

TLS BRs section 7.1.2.7.5 (Subscriber Certificate Types) points to section 7.1.2.7.5 (Extended Validation) for the Extended Validation (EV) certificate type. This section says that the certificate must comply with the current version of the certificate profile in the Extended Validation Guidelines, and follows with specific instructions for the certificate subject, certificate policies, and all other extensions.

While section 7.1.2.7.5 points to the EVGs in the introduction, it follows with references for the certificate subject and all other extensions with explicitly call outs, but for the certificate policies it points to section 7.1.2.7.9 (Subscriber Certificate Policies) of the TLS BRs, which states that the policy qualifiers are not recommended.

4. Why did we not verify this with the Extended Validation Guidelines?

Based on our participation in standards discussions, we understood that removal of policy qualifiers was an agreed change. The team that drafted and verified this change to the certificate profiles and CPS discussed it in the validation subcommittee of the CA/Browser Forum and aligned on the position that the policy qualifier increases the size of the certificate without providing any value to a typical relying party, and the information may be obtained by other means when necessary.

We did not consider this position to be applicable only to the TLS Baseline Requirements, so we did not consider that there could be conflicting language in the Extended Validation Guidelines.

5. Why did we not detect this issue earlier?

The pre-issuance and post-issuance linters that we used did not detect this issue.

This Root Cause analysis helped us to identify areas of improvement, like further separating standardization efforts from review and implementation in our compliance and change management processes.

Lessons Learned

  • Additional layers within our systems, and processes designed specifically to detect and address errors such as these could help to prevent issues such as these.
  • Changes should be reviewed by someone not involved in the standardization itself.
  • Communication and clarity in incident reporting needs to be improved.
  • Posting a preliminary report with an estimated timeframe for posting a complete incident report gives us more time to work out the report according to the expectations of the community.
  • This non-security incident distressed subscribers, not all of whom were prepared for rapid certificate replacement.
  • Subscribers need to be better prepared for mass revocation incidents.
  • A structured and industry-wide approach would bolster preparedness across the ecosystem. An initiative like a regularly scheduled "fire drill" for revocation events could serve as a controlled method to enhance readiness among all stakeholders.

What went well

What didn't go well

  • Certificate profile changes were reviewed in the wrong context.
  • We did not use pkilint on TLS certificates.
  • The initial incident report and subsequent responses did not meet the expectations.
  • We should have stopped issuance of EV certificates without the policy qualifier upon confirmation of the issue, and then followed up to pursue what we saw as a possible oversight in the EV Guidelines.
  • We failed to respond to a certificate problem report within 24 hours, see the incident report in bug #1885754
  • We failed to provide a preliminary incident report to the subscribers and the entity who filed the certificate problem report, see the incident report bug #1890123
  • During the remediation of this incident we failed to handle CPS updates correctly and introduced new errors, see also the incident report in bug #1887753 and bug #1890896.
  • The list of impacted certificates needed to be updated more than once.

Where we got lucky

  • The certificatePolicies:policyQualifiers:qualifier:cPSuri is not required by the Code Signing and S/MIME requirements.

Action Items

Action Item Kind Due Date
Deployment of pkilint as post-issuance linter in addition to existing linters Detect Done
Propose a ballot to align the EVG with the TLS BRs Prevent Ballot passed
Propose a BR of BRs that would help to avoid discrepancies Prevent Done
Improve change management procedures Prevent 2024-06-30
Improve incident management procedures Prevent 2024-06-30

Appendix

Details of affected certificates

See the attachment to this bug report.

To our knowledge there has not been such an error in the requirements before.

That's not what I was asking. I was asking what you made your decision to respond to this the way you did based on?

While the way we initially reported and responded to this incident as well as our initial approach may not have aligned with the community’s established protocols and expectations, our intention was and remains to act in the best interests of those in the web ecosystem, aiming to appropriately balance compliance with practicality and to avoid undue disruption.

This is the comment where you say you've responded to this. This does not answer how you arrived at this strategy.

We do review incident reports and postings to Bugzilla before actually posting them, the contents have been discussed and agreed.

So, multiple policy authority folks at Entrust decided on this path. Did anyone raise concerns around the lines of "hey we should stop issuance"?

These are the things I find concerning here:

  1. To my knowledge, no CA has made the action to continue misissuing certificates once they found out that they are misissuing. Combined with Entrust's involvement in the CA/B, I really do not see this as a mistake, but rather a deliberate decision.

  2. Given Entrust's involvement in CA/B, when Entrust says "initial approach may not have aligned with the community’s established protocols and expectations" - how is this actually possible? You are an active member of this community with plenty of CA expertise. Maybe I would've understood if this statement was from a CA that was minted in the last few months.

  3. Given the recent CPS update incidents, and the failure to revoke incidents surrounding them, please tell me how Entrust isn't once more doing the "Retroactive rule change" that they have now learned was not ok in this incident?

So right now, as a member of the public, my trust in Entrust is as low as it can get. Not only because of the initial response to this incident, that still you have not given us an explanation on how this approach was crafted internally (based on what facts, what previous experiences, what understandings of the rules). But also, uou've effectively had the same approach to the other incidents that ended up branching off of this. Once you've detected problems, it seems like your approach has been well, we'll just change the rules and we'll be okay!

Beyond that, you've been unable to accept that revocation events are not going away, and all your mitigations have been to approach this with let's just entirely eliminate revocation events!

I think at this point we're just going in circles, so I think I no longer have any questions beyond the ones in this comment.

I call on root programs to take appropriate actions here, as the moral hazard of inaction will (and, honestly, somewhat has) taint(ed) the public's trust on WebPKI, and it will make it significantly harder to enforce the self-regulated requirements in the future on different CAs.

s/uou've/you've

Apologies for the typo :)

I must agree with and echo Amir's comment here.

All of these problems are all from Entrust not acting as a trusted CA should.
Even now in https://bugzilla.mozilla.org/show_bug.cgi?id=1886532 we see that in over 3 week, not even half 'affected customers' have been 'fully fixed' and still not even half certificates has been revoked.
It is unacceptable, and I agree that as a member of the public and someone understanding of the field of PKI - Entrust has no trust left.

I hope that other CAs will watch and learn from this, but to do so Mozilla and other root programs here MUST take action. These repeat of failings show clearly that Entrust cannot be trusted in the WebPKI and needs to be removed.

(In reply to amir from comment #42)

To our knowledge there has not been such an error in the requirements before.

That's not what I was asking. I was asking what you made your decision to respond to this the way you did based on?

As described previously, we examined the issue and made a determination that it would be in the best interests of the Web PKI to not revert to a practice that the CA/Browser Forum had adopted as “not recommended.”

While the way we initially reported and responded to this incident as well as our initial approach may not have aligned with the community’s established protocols and expectations, our intention was and remains to act in the best interests of those in the web ecosystem, aiming to appropriately balance compliance with practicality and to avoid undue disruption.

This is the comment where you say you've responded to this. This does not answer how you arrived at this strategy.

See above; we do not have additional information to share beyond our reasoning at the time. We have acknowledged that we should have stopped issuance of EV certificates without the policy qualifier upon confirmation of the issue, and then followed up to pursue what we saw as a possible oversight in the EV Guidelines.

We do review incident reports and postings to Bugzilla before actually posting them, the contents have been discussed and agreed.

So, multiple policy authority folks at Entrust decided on this path. Did anyone raise concerns around the lines of "hey we should stop issuance"?

These are the things I find concerning here:

  1. To my knowledge, no CA has made the action to continue misissuing certificates once they found out that they are misissuing. Combined with Entrust's involvement in the CA/B, I really do not see this as a mistake, but rather a deliberate decision.

  2. Given Entrust's involvement in CA/B, when Entrust says "initial approach may not have aligned with the community’s established protocols and expectations" - how is this actually possible? You are an active member of this community with plenty of CA expertise. Maybe I would've understood if this statement was from a CA that was minted in the last few months.

Our experience and knowledge as an active member of the community led us to view these circumstances as exceptional.

  1. Given the recent CPS update incidents, and the failure to revoke incidents surrounding them, please tell me how Entrust isn't once more doing the "Retroactive rule change" that they have now learned was not ok in this incident?

We did not retroactively change any rules. In the CPS incidents, we have decided to not revoke these certificates as described in those respective incident reports, i.e., because they would result in the issuance of identical certificates, there were no security implications due to the mis-issuance and the revocation and reissuance of these would have been disruptive to the web ecosystem.

So right now, as a member of the public, my trust in Entrust is as low as it can get. Not only because of the initial response to this incident, that still you have not given us an explanation on how this approach was crafted internally (based on what facts, what previous experiences, what understandings of the rules). But also, uou've effectively had the same approach to the other incidents that ended up branching off of this. Once you've detected problems, it seems like your approach has been well, we'll just change the rules and we'll be okay!

Beyond that, you've been unable to accept that revocation events are not going away, and all your mitigations have been to approach this with let's just entirely eliminate revocation events!

We do not assume that revocation events will go away. As previously noted, we are undertaking a great deal of effort to make our subscribers more agile.

I think at this point we're just going in circles, so I think I no longer have any questions beyond the ones in this comment.

We are open to constructive discussion and dialogue with members of this community to see how we can better deal with incidents such as these, i.e., where there is no impact to security, revocation and reissuance would result in identical certificates or certificates in conflict with other guidelines due to guideline inconsistencies and cause disruption to the web ecosystem.

I call on root programs to take appropriate actions here, as the moral hazard of inaction will (and, honestly, somewhat has) taint(ed) the public's trust on WebPKI, and it will make it significantly harder to enforce the self-regulated requirements in the future on different CAs.

We do not agree there is a moral hazard here for all of the reasons articulated in our answers above. This type of incident is not something we have seen before and is one from which we and other CAs can learn.

(In reply to JR Moir from comment #44)

I must agree with and echo Amir's comment here.

All of these problems are all from Entrust not acting as a trusted CA should.
Even now in https://bugzilla.mozilla.org/show_bug.cgi?id=1886532 we see that in over 3 week, not even half 'affected customers' have been 'fully fixed' and still not even half certificates has been revoked.
It is unacceptable, and I agree that as a member of the public and someone understanding of the field of PKI - Entrust has no trust left.

I hope that other CAs will watch and learn from this, but to do so Mozilla and other root programs here MUST take action. These repeat of failings show clearly that Entrust cannot be trusted in the WebPKI and needs to be removed.

We have explained that we are dealing with different subscribers than other CAs that might issue much larger volumes than we do. We hope that members of the community who have experience in the field where Entrust operates understand our challenges and that we are doing everything we can to make our subscribers better prepared and more agile for security incidents. We have had teams working with our subscribers and walk them through the revocation and reissuance process. This process has included multiple touchpoints with subscribers by phone, email, and text message. The impact to subscribers is an important consideration in certain instances like these where security issues are not at play, because when subscribers fail to act this will directly impact relying parties.

The impact to subscribers is an important consideration in certain instances like these where security issues are not at play, because when subscribers fail to act this will directly impact relying parties.

As I’ve pointed out previously, every CA that charges subscribers has a monetary incentive to err on the side of leniency toward their subscribers. The ecosystem balances that incentive with the threat of distrust.

We hope that members of the community who have experience in the field where Entrust operates understand our challenges and that we are doing everything we can to make our subscribers better prepared and more agile for security incidents.

If this mailing list is representative of the community’s sentiment toward Entrust, I wouldn’t count on that being the case. It would be wise to plan and respond accordingly.

(In reply to Paul van Brouwershaven from comment #45)

Our experience and knowledge as an active member of the community led us to view these circumstances as exceptional.

In this and in (too) many other comments on Entrust incident issues in Bugzilla, we are reminded of how active Entrust is in the community, the length of experience they have as a CA, how they nobly push for a stronger ecosystem, their work to advance standards and certificate agility, and even that they do webcasts to help their customers understand how…it’s important to be able to replace certificates. (Perhaps a thing to make clearer at the time of sale, if it wouldn’t harsh the sales call’s vibe too much?)

None of that of course excuses or explains the numerous, often repetitive lapses in correct and accountable operation of their certificate issuance. If anything, it makes me less confident in Entrust as a future keeper of a key to the security of the entire web. It clearly is not (or certainly shouldn’t be) an issue of ignorance in need of better education. Entrust, as Entrust tells us, fairly lives and breathes the norms and practices expected of a WebPKI root certificate authority, and yet…they have failed on virtually every front: maintenance of documentation, validation of certificate attributes, taking responsibility for the integrity of their validation pipeline, response to issue reports, meeting their own deadlines to provide responsive information in incident discussions, and of course revocation of the certificates that are caused to be misissued by this cascade of error. If Entrust, with all that experience, still cannot manage to operate a CA robustly, then what change could possibly suffice to give root programs, and the ecosystem they serve, confidence that Entrust is capable of genuine improvement? Perhaps indeed Bruce and Paul and others are doing everything humanly possible to run a CA responsibly, but are let down by Entrust’s corporate commitment to investment or operational correctness—sad for them, but a possibility that I entreat the root programs to consider. They have contributed checks to zlint, we’re told, but have incident-causing challenges deploying it appropriately within their own highly sensitive and powerful operations nonetheless! This is not a sign of a well organization.

Indeed, Entrust would have us believe that their customers are Very Special, and would be confused by repeated revocations, but have businesses that are highly dependent on these certificates; a large number of big companies with some glaring operational risks there, I guess they’re saying. I think it likely that Entrust would be embarrassed to have to tell these Very Special customers how often the CA they rely on manages to fall short of the requirements that, we are assured, Entrust has deep experience with.

There was a time when I was responsible for the operation of Mozilla’s root program, which I served chiefly by delegating it into the excellent care of Kathleen Wilson, and I’m glad to see how much more organized and detailed things are than when I last operated in this space. I don’t miss the constant special pleading to avoid consequences for subpar operation, but I do fondly remember making Mozilla’s decision to participate in the universal removal of Diginotar. I’m sure that Ben and Ryan and the other modern operators are better than I was at the job of keeping Ligma Certificate Authority and Burger Barn, LLC from damaging the web.

Even from my removed perch reading a survey of historical incidents (not exclusively Entrust, though I would say that they punch above their weight here), it seems clear that root programs are very reluctant to distrust an existing root because of disruption to relying parties and subscribers, both of which groups tend to be blameless other than subscribers perhaps voting with their dollars for the continued existence of an incompetent CA. Here I think that Amir has the right of it(*): there need to be mechanisms available to reduce the damage caused by future mishaps in issuance, by temporarily or permanently restricting the power of the certificates that a known-unreliable CA can issue.

I think it would be very appropriate for root programs to implement the capability to put maximum-validity-period restrictions on included roots. If Entrust—to randomly select a CA—cannot be trusted to issue certificates appropriately or revoke them promptly according to the norms and expectations of the WebPKI ecosystem, then at least we can make sure that a mis-issued certificate isn’t a problem for more than 30/60/90 days. Let’s have ccTLD restrictions too, we’ve been talking about them since at least those heady Diginotar days.

(*) highly recommended reading: https://webpki.substack.com/p/beyond-distrust

FOUR EARTH YEARS AGO, Entrust was educated on the fact that “this misissued certificate doesn’t seem to have a security impact” was not an acceptable reason to declare a situation as exceptional and avoid revocation.

Ryan Sleevi, may his soul have healed from the ordeal, predicted exactly this “we don’t wanna” future in https://bugzilla.mozilla.org/show_bug.cgi?id=1651481#c7

Correct, I do not consider this to be exceptional, and in fact contrary to the intent of this section. I want to make sure Entrust is aware of this, because it would be quite concerning if such a consideration is used again, and is deeply concerning that this was Entrust's rationale at the time.

Please note also the plan promised those four years ago by Entrust in https://bugzilla.mozilla.org/show_bug.cgi?id=1651481#c6

  • We will not the make the decision not to revoke.
  • We will plan to revoke within the 24 hours or 5 days as applicable for the incident.
  • We will provide notice to our customers of our obligations to revoke and recommend action within 24 hours or 5 days based on the BR requirements.
  • We will recommend to our customers to implement automation of certificate management.
    [ed: still recommending this, were told!]
  • We will increase our ability for correct implementation and testing to ensure that certificate profiles will meet the latest CA/Browser Forum or root program requirements.
  • We will monitor the Mozilla incidents and the discussion list to discover problems which other CAs have experienced and how they were resolved. This will allow us to review and react if required to our own implementation. This will also help to minimize the number of miss-issued certificates, which will reduce the risk of late revocation.
  • We will manage and update our pre-issuance and post-issuance linting to discover or prevent the problem early.

And indeed Previous Ryan took this to be a very strong commitment:

That said, I'm encouraged by Entrust's commitment that they will never, in any circumstance, delay revocation again, for any reason. If that's not what is meant by Comment #6, however, then I think a more careful analysis of the factors that may cause Entrust to delay revocation, and the specific steps being taken to address those, is necessary.

The Entrust we have witnessed in so many incidents since then is not one that I think can be reasonably viewed as having lived up to their commitments to root programs in that bug. Indeed, they have promised to do those things again and again in subsequent incidents, many of which would have been prevented if they had kept their promises. Mr Sleevi also saw this possibility, sadly:

I'm concerned about the "We considered the policy, but we decided it doesn't matter" response, and there's not really more data at this point to support things. This would appear a willful, intentional, and flagrant violation. I realize that some no doubt consider the "SHA-384 vs SHA-256" to be minor, and think a large reaction is unjustified. However, on a substantive level, it's a question about whether the CA will follow Root Program requirements, even if they don't understand or agree with them. This is telling, because we've seen in the past CAs ignore requirements they don't understand or agree with, such as "Don't issue a MITM certificate" (e.g. Trustwave). It also raises concerns whether the CA is willing and is capable of revoking in response to security and compliance incidents.

(It’s darkly amusing to note that Entrust has been playing, for so long, both the “we are so experienced” and “our customers are unicorn snowflakes, and will collapse the global economy if they have to rotate within five days” cards. More dark than amusing when considering the significance of their role in the WebPKI ecosystem and the dependence of so many critical enterprises on their shabby operation.)

We have no updates for this week and will continue to monitor the bug.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: