Open Bug 1887705 Opened 3 months ago Updated 3 days ago

Entrust: Delayed revocation of clientAuth TLS Certificates without serverAuth EKU

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: paul.vanbrouwershaven, Assigned: paul.vanbrouwershaven, NeedInfo)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Preliminary Incident Report

Summary

Entrust has issued clientAuth TLS Certificates without the serverAuth EKU as reported in #1886467.
All affected certificates should have been revoked within 5 days after we were made aware of the incident.
This incident report focuses on the delayed revocation; other updates will be provided in #1886467.
We are working actively to complete revocation and will provide weekly updates on our progress .

Impact

  • All certificates affected by the original incident (#1886467) are also affected by this incident.
  • Some customers impacted by this incident are also impacted by incident #1883843 (cPSuri missing).

Timeline

All times are UTC.

2024-03-20:

  • 14:00 We started to approach impacted customers and asked them to replace and revoke their id-kp-clientAuth only TLS certificates.
  • 14:42 Published original incident report (#1886467).

Root Cause Analysis

Revocation of affected certificates is taking longer than five days due to the following:

  • Some customers indicated that using certificates that contain both id-kp-clientAuth and id-kp-serverAuth would disrupt their applications and environments.
  • The certificate profile changed.
    • Customers need more time to test the impact of the changed certificate profile.
    • Some customer CLM systems started to decline the replacement certificates.
  • Customers are experiencing a high workload already as a result of incident #1883843.
  • Customers indicated that these certificates are more challenging to replace than regular id-kp-serverAuth certificates.
  • In addition, several customer systems are in lockdowns due to end of fiscal year/quarter or the Easter holiday.

Lessons Learned

What went well

What didn't go well

Where we got lucky

Action Items

In addition to the action items listed in #1886467 and #1886532 we have identified the following actions items.

Action Item Kind Due Date

Appendix

Details of affected certificates

See the original incident (#1886467).

Assignee: nobody → paul.vanbrouwershaven
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [leaf-revocation-delay]

We are working with 114 customer accounts to revoke and re-issue 1,176 affected clientAuth TLS Certificates without the serverAuth EKU. Here is a summary of our progress as of this posting:

  • 137 of 1,176 certificates have been revoked or expired.
  • 208 certificates have been re-issued with revocation pending.
  • 30 out of 114 customer accounts have fully remediated the issue (certificates re-issued and old certificates revoked).

We will be providing weekly updates on our progress until this issue is fully remediated.

Update on the revocation progress:

  • 146 certificates have been revoked or expired.
  • 74 certificates have been re-issued with revocation pending.
  • 38 out of 114 customer accounts have fully remediated the issue (certificates re-issued and old certificates revoked).

Please note that we have corrected the method for calculating the number of certificates listed as re-issued with pending revocation from our previous update.

Update on the revocation progress:

  • 176 certificates have been revoked or expired.
  • 60 certificates have been re-issued with revocation pending.
  • 52 out of 114 customer accounts have fully remediated the issue (certificates re-issued and old certificates revoked).

Please note that one customer has the large majority of certificates impacted by this incident, they are working on testing the alternative solution with a subset of endpoints to avoid disruption.

Update on the revocation progress:

  • 199 certificates have been revoked or expired.
  • 82 certificates have been re-issued with revocation pending.
  • 60 out of 114 customer accounts have fully remediated the issue (certificates re-issued and old certificates revoked).

Update on the revocation progress:

  • 218 certificates have been revoked or expired.
  • 76 certificates have been re-issued with revocation pending.
  • 69 out of 114 customer accounts have fully remediated the issue (certificates re-issued and old certificates revoked).

Update on the revocation progress:

  • 273 certificates have been revoked or expired.
  • 26 certificates have been re-issued with revocation pending.
  • 88 out of 114 customer accounts have fully remediated the issue (certificates re-issued and old certificates revoked).

Update on the revocation progress:

  • 310 certificates have been revoked or expired.
  • 15 certificates have been re-issued with revocation pending.
  • 101 out of 114 customer accounts have fully remediated the issue (certificates re-issued and old certificates revoked).

By my count at 310 certificates of 1,176 affected certificates we're at 26.36% revoked after 43 days. That is ... not good.

Do we have a detailed breakdown of affected subscribers and when revocation will actually occur?

(In reply to Paul van Brouwershaven from comment #2)

Please note that we have corrected the method for calculating the number of certificates listed as re-issued with pending revocation from our previous update.

Did we ever receive a corrected figure and list of impacted certificates for this issue? This is still a Preliminary Incident Report 38 days in.

This is still a Preliminary Incident Report 38 days in.

Agreed. The root cause section is effectively sparse of details, and is blaming customers for Entrust's inability to revoke on time.

(In reply to Wayne from comment #8)

By my count at 310 certificates of 1,176 affected certificates we're at 26.36% revoked after 43 days. That is ... not good.

Do we have a detailed breakdown of affected subscribers and when revocation will actually occur?

We do plan to add this breakdown and will provide this information next week. We have 9 customer with pending revocations of which one customer has the large majority of certificates impacted by this incident.

(In reply to Paul van Brouwershaven from comment #2)

Please note that we have corrected the method for calculating the number of certificates listed as re-issued with pending revocation from our previous update.

Did we ever receive a corrected figure and list of impacted certificates for this issue? This is still a Preliminary Incident Report 38 days in.

This note was about the certificates listed as re-issued with pending revocation, not about the impact certificates. The impacted certificates are correctly stated in the impact of this bug.

This bug was incorrectly created as an Preliminary Incident report (see also bug 1886532 comment 8), with the assumption that we have to continue to update the numbers as we progress with the revocation, besides that, we have no additions to the report itself.

Update on the revocation progress:

  • 322 certificates have been revoked or expired.
  • 7 certificates have been re-issued with revocation pending.
  • 105 out of 114 customer accounts have fully remediated the issue (certificates re-issued and old certificates revoked).

(In reply to Paul van Brouwershaven from comment #11)

  • 322 certificates have been revoked or expired.

There is a big difference between certificates that were revoked by the CA and certificates that expired in the time period, especially in the case of an extremely delayed revocation event or one in which there are serious concerns that the CA is placing its direct customers’ good over that of the WebPKI.

In this incident, for example, 322 of 1176 impacted certificates, or 27.4%, are revoked or expired after 49 days. To put things in perspective, if these certificates’ expirations were perfectly evenly spread over the course of the calendar year, we would expect 13.4% of them to have expired on their own.

That is NEARLY HALF of the reported “revoked or expired” certificates. In a scenario where a CA is trying to minimize the amount of revocation it performs, extended delay would be an effective technique.

I put it to the community that in the interest of transparency, Entrust should provide new current numbers that break out revoked and expired certificates separately and that moving forward, Entrust should report those numbers separately. This same comment applies to bug 1886532 and to any delayed revocation bug from any CA.

(In reply to Tim Callan from comment #12)

(In reply to Paul van Brouwershaven from comment #11)

  • 322 certificates have been revoked or expired.

There is a big difference between certificates that were revoked by the CA and certificates that expired in the time period, especially in the case of an extremely delayed revocation event or one in which there are serious concerns that the CA is placing its direct customers’ good over that of the WebPKI.

In this incident, for example, 322 of 1176 impacted certificates, or 27.4%, are revoked or expired after 49 days. To put things in perspective, if these certificates’ expirations were perfectly evenly spread over the course of the calendar year, we would expect 13.4% of them to have expired on their own.

That is NEARLY HALF of the reported “revoked or expired” certificates. In a scenario where a CA is trying to minimize the amount of revocation it performs, extended delay would be an effective technique.

I put it to the community that in the interest of transparency, Entrust should provide new current numbers that break out revoked and expired certificates separately and that moving forward, Entrust should report those numbers separately. This same comment applies to bug 1886532 and to any delayed revocation bug from any CA.

Tim,

I agree with you that this could easily be abused but there is precedent where the certificate's natural expiration was an effective mitigation measure (https://bugzilla.mozilla.org/show_bug.cgi?id=1715672). Of course the volume of certificates of that bug is much larger than this one, but the principle remains.

Similarly, in the serial number entropy series of incidents, CAs tried to revoke certificates for weeks, even months because some of their Subscribers used public certificates in a manner that prevented "fast" replacement (unfortunately, "fast" was perceived very differently by different subscribers).

I think it would make more sense to see that the CA (Entrust in this case) is continuously putting effort, pushing for faster certificate revocation by working with their "special-case" subscribers. I'm not sure how this could be accomplished though, besides a statement by the CA.

Perhaps a listing of the affected types of subscribers (e.g. banks, government agencies, military, nuclear factories (!), etc) and the revocation timeline from the beginning of the incident, would provide a more useful feedback to the community. At the very least it would show which sector demonstrates reduced agility to replace public certificates and the community could work on this problem to provide useful ideas on how to improve that.

This bug is still missing the element of subscriber identification (from a quick look at other delayed revocation bugs, this seems to be a pattern of not sharing per-subscriber analysis per https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation) and I can understand the fact that there might be NDAs in place that prevent the Subscriber names from being explicitly included in a public bug. However, since these are mostly identity certificates and are included in the public CT logs, an independent analyst could very quickly compile a list of the organizationName values and post them here. I consider the sector grouping of delayed revocation subscribers more useful to assist with future improvements.

(In reply to Tim Callan from comment #12)

(In reply to Paul van Brouwershaven from comment #11)

  • 322 certificates have been revoked or expired.

There is a big difference between certificates that were revoked by the CA and certificates that expired in the time period, especially in the case of an extremely delayed revocation event or one in which there are serious concerns that the CA is placing its direct customers’ good over that of the WebPKI.

In this incident, for example, 322 of 1176 impacted certificates, or 27.4%, are revoked or expired after 49 days. To put things in perspective, if these certificates’ expirations were perfectly evenly spread over the course of the calendar year, we would expect 13.4% of them to have expired on their own.

That is NEARLY HALF of the reported “revoked or expired” certificates. In a scenario where a CA is trying to minimize the amount of revocation it performs, extended delay would be an effective technique.

I put it to the community that in the interest of transparency, Entrust should provide new current numbers that break out revoked and expired certificates separately and that moving forward, Entrust should report those numbers separately. This same comment applies to bug 1886532 and to any delayed revocation bug from any CA.

I will note I have pushed for this before and the answer I was eventually provided was:

However in short: You have not answered my question at all. I am well aware of the 6,008 OV TLS certificates mentioned. Are Entrust willing to be transparent and provide a breakdown on: "'outstanding mis-issued certificates', 'expired certificates', 'revoked certificates'" in respect to this issue?

This incident provides a crt.sh link for each certificate, so the certificate status can be found. Note, the 14 missing certificates per https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/J3aX8OKIT_A were posted to CT.

I was straight up told by Entrust to scrape crt.sh and do the research myself rather than them providing basic information.
(In reply to Dimitris Zacharopoulos from comment #13)

This bug is still missing the element of subscriber identification (from a quick look at other delayed revocation bugs, this seems to be a pattern of not sharing per-subscriber analysis per https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation) and I can understand the fact that there might be NDAs in place that prevent the Subscriber names from being explicitly included in a public bug. However, since these are mostly identity certificates and are included in the public CT logs, an independent analyst could very quickly compile a list of the organizationName values and post them here. I consider the sector grouping of delayed revocation subscribers more useful to assist with future improvements.

I don't believe there is an NDA issue as they have previously done this as late as April 2023, but apparently decided to not continue it in later incident reports: https://bugzilla.mozilla.org/show_bug.cgi?id=1804753#c0

Of course the volume of certificates of that bug is much larger than this one, but the principle remains.

I disagree with this because:

  1. That incident had actual action items.
  2. That incident created ARI.
  3. That incident wasn't as easily preventable with simple linting.
  4. The certificates impacted were 90 days-ish (sorry, had to :P) validity. This means that the impact of the misissued certificates were significantly more limited. Meanwhile for Entrust, its going to take them significantly longer than 90 days to revoke these certs at the speed they're happening.
  5. Let's Encrypt hasn't re-used their argument for why they couldn't revoke since then as far as I know.

Furthermore, those misissued certificates weren't breaking any part of the BRs directly, and just breaking the CP/CPS of the CA. The BRs do not make a recognition for this technicality, so I won't really focus on it. In the case of this incident for Entrust's failure to revoke on time, the misissuance is a fundamental divergence from the accepted certificate profiles.

Also, I agree with Wayne that the NDA argument does not make sense here for why the per-subscriber breakdown hasn't happened. Beyond that, that document does not apply to any of the other root program.

I think it would make more sense to see that the CA (Entrust in this case) is continuously putting effort, pushing for faster certificate revocation by working with their "special-case" subscribers. I'm not sure how this could be accomplished though, besides a statement by the CA.

It's actually pretty simple. Root programs can enforce the rules that are written as part of the BRs. Most of us participating here know that large enterprises will not actually prioritize automating their certificate issuance, until they have their hands forced to do so. What's been frustrating with all of this is that, everyone else has to pay the price of ${BIG_ENTERPRISE} not caring about solving this problem.

Work towards automated certificate issuance simply won't be prioritized until the current model of appeasement stops being acceptable by the root programs.

Statements from CAs, especially CAs that haven't kept their promises made in the past, is not worth much.

(In reply to amir from comment #15)

Of course the volume of certificates of that bug is much larger than this one, but the principle remains.

I disagree with this because:

  1. That incident had actual action items.
  2. That incident created ARI.
  3. That incident wasn't as easily preventable with simple linting.
  4. The certificates impacted were 90 days-ish (sorry, had to :P) validity. This means that the impact of the misissued certificates were significantly more limited. Meanwhile for Entrust, its going to take them significantly longer than 90 days to revoke these certs at the speed they're happening.
  5. Let's Encrypt hasn't re-used their argument for why they couldn't revoke since then as far as I know.

I wasn't arguing about any of that. Theoretically, something good should come out of every public incident and the LE's bug is a great example of that. I was referring to the position of not revoking and leaving certificates naturally expire as a principle that seems to be a concern raised by Tim Callan.

During that referred incident, the CA performed a risk analysis, shared their arguments and their decision not to revoke. It was considered acceptable by the community and IMO rightfully so. This establishes the fact that in some circumstances of non-compliance (either against the BRs or the CA's CP/CPS), it is reasonable for a CA to analyze the risks of Relying Parties associated with the mis-issuance, share their results and an opinion for what they consider a best course of action (revoke or not revoke). The community is usually very responsible and evaluates those positions very carefully. However, the community also needs to be fair and take into account past incidents with similar patterns (like the reasonable right for performing a risk analysis of how RPs would be affected by the mis-issuance).

Furthermore, those misissued certificates weren't breaking any part of the BRs directly, and just breaking the CP/CPS of the CA. The BRs do not make a recognition for this technicality, so I won't really focus on it. In the case of this incident for Entrust's failure to revoke on time, the misissuance is a fundamental divergence from the accepted certificate profiles.

While Entrust has enough important incidents/mistakes to answer for, during these series of incidents, there were similar issues with their CP/CPS, just like the referred incident. Regardless, those CP/CPS issues were evaluated by the community with the same strictness as directly breaking parts of the BRs. I'm not saying that this is negligible but I'm trying to be fair, comparing other incidents, especially when a series of revocations already took place not too long ago, and the same Subscribers need to perform another round of replacements in a very short period of time. I'm referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1890685.

Also, I agree with Wayne that the NDA argument does not make sense here for why the per-subscriber breakdown hasn't happened. Beyond that, that document does not apply to any of the other root program.

I think it would make more sense to see that the CA (Entrust in this case) is continuously putting effort, pushing for faster certificate revocation by working with their "special-case" subscribers. I'm not sure how this could be accomplished though, besides a statement by the CA.

It's actually pretty simple. Root programs can enforce the rules that are written as part of the BRs. Most of us participating here know that large enterprises will not actually prioritize automating their certificate issuance, until they have their hands forced to do so. What's been frustrating with all of this is that, everyone else has to pay the price of ${BIG_ENTERPRISE} not caring about solving this problem.

Work towards automated certificate issuance simply won't be prioritized until the current model of appeasement stops being acceptable by the root programs.

Statements from CAs, especially CAs that haven't kept their promises made in the past, is not worth much.

I was mainly wondering what evidence would be reasonable to obtain in order for the community to independently determine whether the CA is actively pushing for faster revocations and not being "relaxed", waiting for the certificates to naturally expire.

Just to clarify, my initial post was trying to follow-up on Tim's comment about CAs leaving certificates naturally expire instead of pushing for faster revocations.

(In reply to Dimitris Zacharopoulos from comment #16)

I was mainly wondering what evidence would be reasonable to obtain in order for the community to independently determine whether the CA is actively pushing for faster revocations and not being "relaxed", waiting for the certificates to naturally expire.

Just to clarify, my initial post was trying to follow-up on Tim's comment about CAs leaving certificates naturally expire instead of pushing for faster revocations.

This can be determined independently as the list of impacted certificates is public. There is already an expectation to provide information on expired certificates independently of revoked certificates alongside when they're expired to expire. If this were a single issue in a vacuum, we could have that discussion but unfortunately there is a trial of incidents going back years with the same fact pattern.

While trying to ascertain the expiration theory I have to get the list of clientAuth misissued certificates. Entrust provided a list of 1176 and state that they stopped mis-issuance on 2024-03-20 in a very lackluster timeline. So I did run to censys.io to try and reverse engineer a search pattern to find the above impacted certificates.

Unfortunately when trying to scope out this issue I noticed that 1222 appears on censys.io, we can put a few down to a badly scoped search this is just a rough idea. So I ran with a different theory and searched for clientAuth certificates lacking serverAuth since 2024-03-20 (inclusive). I got 52 hits, 10 of which are revoked:

crt url Not Before Revoked
https://crt.sh/?sha256=893D0C9433D9A373C679583A28618B4D0E3FCB5289200E04FB466034CB3719DD 2024-03-22 2024-04-22
https://crt.sh/?sha256=23E84AFABA9BF73CF0E658143C948DD138670F7DD21538659112F81A29CB41B4 2024-03-25 2024-04-25
https://crt.sh/?sha256=FD46341FC6DA4FF1DC5B8CD8BFF0E57B7E141F1E54B8AAF342FECF99761012A1 2024-03-25 2024-04-25
https://crt.sh/?sha256=AD76053029AC0AE32AB1D77A66D10DE1AF57942B3CD632B76F88C6FAA849707B 2024-03-25 2024-04-26
https://crt.sh/?sha256=D42B39184458D78A98F8956A3651EC642DA3DEFDAE4782C6CAB05D9A58D50D88 2024-03-25 2024-04-25
https://crt.sh/?sha256=77BE5E8D34F321840E0A374A06867AFFEA45126BD2CB8ACDF838D63880E0DF51 2024-03-25 2024-04-24
https://crt.sh/?sha256=5989B5B41080774CEDADBC5BCBFC71E89DBD2A9EAEB5E12CED00871A442E7CD8 2024-03-25 2024-04-25
https://crt.sh/?sha256=0E58E57320903A97813CB5D97279CB69C2C554C139EE8726C74C0431E9311BC6 2024-03-25 2024-04-19
https://crt.sh/?sha256=D81B13BE20FF163CAC03C6FB8A5754BF245EB1487784759ED5D4A6E0B79D4790 2024-03-22 2024-04-22
https://crt.sh/?sha256=9A44D0C7EA08EF8B811139217B447DDD47BD0961C69B0F759C93E654315262F8 2024-03-25 2024-04-05

Not Before on these certificates go from 2024-03-22 to 2024-03-25. I can see certificates up to 2024-03-26 in the larger query that are unrevoked. The Entrust EV OID policies are attached to all the above certificates too.

Can someone weigh in on this situation?

Flags: needinfo?(paul.vanbrouwershaven)

(In reply to Wayne from comment #17)

(In reply to Dimitris Zacharopoulos from comment #16)

I was mainly wondering what evidence would be reasonable to obtain in order for the community to independently determine whether the CA is actively pushing for faster revocations and not being "relaxed", waiting for the certificates to naturally expire.

Just to clarify, my initial post was trying to follow-up on Tim's comment about CAs leaving certificates naturally expire instead of pushing for faster revocations.

This can be determined independently as the list of impacted certificates is public. There is already an expectation to provide information on expired certificates independently of revoked certificates alongside when they're expired to expire. If this were a single issue in a vacuum, we could have that discussion but unfortunately there is a trial of incidents going back years with the same fact pattern.

Checking the rate at which impacted certificates get revoked is easy, but it's difficult to receive feedback about the CA's effort for pushing for faster revocation to their subscribers. Thinking out loud, I think the community would learn something if there was a breakdown of impacted certificates per Subscriber industry group, number of impacted certificates and revocations of those certificates at weekly intervals. Here's a though with dummy numbers. Obviously if the community finds this useful, it could be requested from CAs with open delayed revocation incidents and add it to the incident handling procedure.

Industry Group Impacted Certificates Revoked at week 1 Revoked at week 2 Revoked at week 3 Revoked at week x
Financial Institutions 100 80 10 10 -
Government Agencies 70 10 20 40 -
Health Institutions 200 10 30 40 50... and so on
Energy Providers 50 10 10 30 -
Telecommunication Providers 300 200 50 50 -
... ... ... ... ... ...

This might give a sense of which sectors find it more difficult to replace certificates and allow the community to prioritize in working for solutions in those specific problematic critical sectors (in this example, the health institutions would stand out). Perhaps there are challenges that we do not fully understand or those industries don't understand and we need to develop guidance, focusing on specific replacement challenges these sectors phase.

Take for example https://bugzilla.mozilla.org/show_bug.cgi?id=1891331#c8 which is some feedback from these critical sectors. Many of those challenges can be easily addressed (like: "No, you are not supposed to get permission from a Supervisory Authority to replace a TLS Certificate, and if you are, you are probably not supposed to be using a Public TLS Certificate") but some might require more discussion and changes in policy.

It might even give a sense of timing for these replacements so we can re-evaluate the 5-day rule. For example, if the majority of critical sectors manage to replace their certificates in 2 weeks, it may be useful evidence for discussion and re-evaluation of the 5-day revocation rule.

Like I said, this is just a suggestion because to me, I don't find it very useful to just see reports like: "from the beginning of this incident, we revoked 60 out of 200 certificates". It doesn't help me get any deeper understanding about the type of subscribers that are finding it so hard to replace certificates so I can think and propose solutions/guidance for them.

We should probably not hijack this incident bug and move this discussion to m.d.s.p. I will try to do that next week.

(In reply to Dimitris Zacharopoulos from comment #16)

and the same Subscribers need to perform another round of replacements in a very short period of time. I'm referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1890685.

I don’t understand how we can possibly see “previous incident occurred recently” as a reason to accept a longer revocation timeline. If a subscriber just replaced a cert then they know the process works, have a list of affected systems already, and there’s no risk of the one person who knows how being gone. That aside, revocation of certificates is intended to preserve the integrity of the WebPKI and allow relying parties to make decisions—minor or major—based on the assumption that the BRs and root program requirements are being met in good faith and with earnest effort to comply. The recency of a preceding revocation event for a given subscriber doesn’t alter anything in the calculus for a given event.

Subscribers who wish to reduce the likelihood of revocation-related incidents should choose a CA that doesn’t have a lot of incidents that (should) cause revocation, I guess. I fear that’s a big part of what Entrust is trying to avoid by reducing the number of times they need to tell their subscribers that their certificates were misissued.

(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #19)

(In reply to Dimitris Zacharopoulos from comment #16)

and the same Subscribers need to perform another round of replacements in a very short period of time. I'm referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1890685.

I don’t understand how we can possibly see “previous incident occurred recently” as a reason to accept a longer revocation timeline. If a subscriber just replaced a cert then they know the process works, have a list of affected systems already, and there’s no risk of the one person who knows how being gone. That aside, revocation of certificates is intended to preserve the integrity of the WebPKI and allow relying parties to make decisions—minor or major—based on the assumption that the BRs and root program requirements are being met in good faith and with earnest effort to comply. The recency of a preceding revocation event for a given subscriber doesn’t alter anything in the calculus for a given event.

You will need to view my comment #16 in connection with other similar incidents where CAs didn't revoke, despite mis-issuance, and it was acceptable by the community.

Subscribers who wish to reduce the likelihood of revocation-related incidents should choose a CA that doesn’t have a lot of incidents that (should) cause revocation, I guess. I fear that’s a big part of what Entrust is trying to avoid by reducing the number of times they need to tell their subscribers that their certificates were misissued.

I'm afraid this statement is contrary to the spirit of incident reporting of the Mozilla and other Root Programs. CAs are encouraged to disclose incidents for increased transparency. Counting "how many incidents have been opened by a CA" to determine "how good/bad a CA is", leads to false assumptions.

CAs are encouraged to disclose incidents for increased transparency. Counting "how many incidents have been opened by a CA" to determine "how good/bad a CA is", leads to false assumptions.

This is a bad faith reading of what Mike was stating.

But first:

CAs are encouraged to disclose incidents for increased transparency.

No. They’re required to disclose incidents. Willful non disclosure of incidents is the any% speedrun for being distrusted.

Counting "how many incidents have been opened by a CA" to determine "how good/bad a CA is", leads to false assumptions.

No one here is suggesting counting these incidents, but rather looking at the quality of the incident response, how preventable it was, and what the actions that came out of the incident was. For example if a CA keeps having a repeat of the same incident type, then that’s generally a sign that something is bad.

When Entrust has 1) broken their promise to not delay revocation 2) not provided a reasonable set of action items to prevent the reoccurrence of these incidents, then those assumptions about a CA are made. And those assumptions are reasonable ones.

(In reply to amir from comment #21)

CAs are encouraged to disclose incidents for increased transparency. Counting "how many incidents have been opened by a CA" to determine "how good/bad a CA is", leads to false assumptions.

This is a bad faith reading of what Mike was stating.

How did you come to that conclusion? It was totally not read in bad faith, this has been discussed before in m.d.s.p. and Ben Wilson could comment on that. How else can one read the following:

"subscribers [...] should choose a CA that doesn’t have a lot of incidents that (should) cause revocation"

This is a very clear position that can't be easily mis-read :-) Perhaps the commenter wanted to say something different but that was my interpretation and it wasn't done in bad faith.

BTW, with the current rules, revocation is the only available tool for remediating violations associated with Certificates.

But first:

CAs are encouraged to disclose incidents for increased transparency.

No. They’re required to disclose incidents. Willful non disclosure of incidents is the any% speedrun for being distrusted.

"No" is a very absolute way to comment on a person's opinion...

As this issue has been discussed before, if the community's interpretations that the more incidents resulting in revocation the worse it should be considered, you will see considerably less incidents being opened in the future, especially in cases where the there is no external way to detect the reported issue. IMO that's not how incidents should be read. "Transparency" is not easily achievable when there is an axe hanging above the CA's neck. Again, I hope Ben Wilson and other Root Store Program Managers can weigh in on whether my reading is aligned with their expectations. It's useful to clarify.

BTW, disclosing an incident doesn't mean a CA should not take ownership of their mistakes and share the necessary information to assure the community that the (root) causes have been properly mitigated and the CA is in full control and understands the expectations/requirements.

Counting "how many incidents have been opened by a CA" to determine "how good/bad a CA is", leads to false assumptions.

No one here is suggesting counting these incidents, but rather looking at the quality of the incident response, how preventable it was, and what the actions that came out of the incident was. For example if a CA keeps having a repeat of the same incident type, then that’s generally a sign that something is bad.

100% agreed. Incidents should be seen as opportunities for CAs to share details about their operations/practices/system design in order to assure the community that they understand the expectations/requirements, and promptly respond on how they will fix the reported issues.

Perhaps I misunderstood the statement for counting incidents and I hope the author can clarify if there was a different message he wanted to convey.

When Entrust has 1) broken their promise to not delay revocation 2) not provided a reasonable set of action items to prevent the reoccurrence of these incidents, then those assumptions about a CA are made. And those assumptions are reasonable ones.

I agree, and this is the incidents' part where any CA must demonstrate that they understand the expectations/requirements and promptly respond on how they will fix the reported issues. The Overview of Responding to an Incident and Maintenance and Enforcement of MRSP sets the expectations. (Side note: The "Examples of Good Practice" section may need to be updated because the community has seen equally -if not better- responses in the recent years).

Subscribers that want to do some analysis about which CA to work with, should not just read the count() of the open/closed incidents but they must read each one in order to understand the nature of the mistake(s), the impact, and the remediation actions. Then they will be able to have a better understanding some important, decision-making elements like :

  • the impact of the incident
  • whether the issue(s) were avoidable
  • the effectiveness of CA's remediation actions
  • how well a CA understands the expectations/requirements set by the Mozilla Root Program (and others, since they all practically prefer the Bugzilla as a tool to manage CA incidents).

I think we're mostly in agreement but please let me know if I mis-read something.

(In reply to Dimitris Zacharopoulos from comment #13)

This bug is still missing the element of subscriber identification (from a quick look at other delayed revocation bugs, this seems to be a pattern of not sharing per-subscriber analysis per https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation)

Yes, and other comments have pointed out recently on this forum that this requirement exists. Nonetheless, nearly every CA with a delayed revocation incident continues to ignore it.

This is an important point to underscore. The majority of the late revocation reports we see imply that the CA would love to revoke the certificates on time and did its best, but those pesky Subscribers and their lack of automation make it impossible.

No matter what, it is 100% in the CA’s control to provide a per-subscriber explanation for the delayed revocation. For any CA that is truly trying its earnest best to revoke certificates on time, this list will be trivially easy to report, as it will be produced as a by-product of the CA’s best efforts to perform a compliant revocation. In addition to separating expired from revoked certificates in reports, CAs should produce this list on a subscriber-by-subscriber basis before the revocation deadline for every late revocation event.

Here is our progress, considering the recent comments to this bug:

  • 359 of 1,176 (30.5%) certificates have been revoked or expired.
    • 346 revoked and not expired
    • 13 expired (before or after revocation)
  • 2 certificates have been re-issued with revocation pending.
  • 110 out of 114 customer accounts (96.5%) have fully remediated the issue (certificates re-issued and old certificates revoked or expired).

Of the 4 remaining customer accounts 3 are financial institutions, and 809 of the remaining 817 certificates (99.0%) belong to a single subscriber.

We are working with each customer to accelerate the time to revocation.

Here is the information per customer:

MasterCard International Incorporated

  • Industry: financial
  • Reason for delay: Dependency in worldwide payments network. Certificate profile needed to be updated each certificate needs to be tested with each partner before it can be taken into production
  • Progress: 809 of 872 certificate remaining
  • Estimated date of completion: May 30, 2024

Ciena Corporation

  • Industry: IT
  • Reason for delay: Inadequate automation/resource constraints: more time required to perform manual replacement
  • Progress: 1 of 1 certificate remaining
  • Estimated date of completion: May 17, 2024

Prudential Financial

  • Industry: financial
  • Reason for delay: Inadequate automation/resource constraints: more time required to perform manual replacement
  • Progress: 2 of 4 certificate remaining
  • Estimated date of completion: May 30, 2024

Bank of America

  • Industry: financial
  • Reason for delay: This additional time is necessary to test/confirm replacement has been successful and no outages will be triggered by revocation
  • Progress: 5 of 5 certificate remaining
  • Estimated date of completion: May 30, 2024
Flags: needinfo?(paul.vanbrouwershaven)

I will put forward that if one of your subscribers, none of which are at all lacking in resources, are incapable of handling revocation within 57 days (or over 70 by your proposed timeline) then they shouldn't be relying on a public root to begin with. This incident has been ongoing since early March and subscribers were approached around March 20th. We are in the middle of May, if the companies involved do you wish to put resources into handling this then it was an educational problem on the CA's behalf.

I have not seen any scenario proposed that is an immediate danger to human life, and that we're still dancing around trying to give additional weeks at this point is abysmal. It is your job as a Certificate Authority to make sure your Subscribers understand that you have a regulatory duty to revoke within 5 days. Failure to do so puts your company at risk of continual operation in this space, and the continued claims of ignorance of your Subscribers is not a shield it is emblematic of your continued failure to perform your duties.

What is being done in the immediate term to make sure this will never happen again? What was done in the previous incidents to make sure this will never happen again, and why did all of those steps fail despite every promise being made?

I agree with Comment #20, Comment #21, and Comment #22 that root programs want to encourage reporting for transparency purposes, not discourage it, and that it is not the quantity of incident reports, but the quality of the incident reports that we care about.

(In reply to Paul van Brouwershaven from comment #24)

  • 359 of 1,176 (30.5%) certificates have been revoked or expired.
    • 346 revoked and not expired
    • 13 expired (before or after revocation)

Paul,

First of all, thank you for adding this information. It helps transparency and should be a required practice for all late revocation events. I encourage other CAs to follow this example. I look forward to a similar breakdown in your future reports for bug 1886532, bug 1890685, and bug 1890898.

Please help me understand what I’m reading here. When you say.

  • 13 expired (before or after revocation)

…I want to be sure I’m interpreting “after revocation” correctly. Are you saying that some of these 13 expired on their own while others are now beyond the notAfter date but that Entrust revoked them prior to expiration?

If so, I don’t think it’s important to track when they would have expired. The key point is did the CA revoke them or just let them expire on their own. It is more useful to the community to divide the certificates into those that were revoked by the CA and those that were active until expiration. Two simple, unambiguous numbers.

(In reply to Wayne from comment #25)

I will put forward that if one of your subscribers, none of which are at all lacking in resources, are incapable of handling revocation within 57 days (or over 70 by your proposed timeline) then they shouldn't be relying on a public root to begin with. This incident has been ongoing since early March and subscribers were approached around March 20th. We are in the middle of May, if the companies involved do you wish to put resources into handling this then it was an educational problem on the CA's behalf.

We agree that there is an opportunity for us to do more education with our subscribers around the requirements surrounding public trust certificates and the options available to them to ease the revocation and re-issuance process (e.g., automation tools) or the option to move to a private or shared PKI instead of the WebPKI. This is one of the items included in our forthcoming report to the community around remedial measures. Our larger subscribers operating critical infrastructure often tell us that they have internal requirements preventing them from meeting quick revocation timelines that even automation would not overcome (e.g., mandatory change control processes, layers of required approval for infrastructure changes, coordination with multiple business teams, etc.). We think it may be helpful in the future to bring some of this subscriber feedback to a future CA/B Forum meeting for further discussion and consideration.

I have not seen any scenario proposed that is an immediate danger to human life, and that we're still dancing around trying to give additional weeks at this point is abysmal. It is your job as a Certificate Authority to make sure your Subscribers understand that you have a regulatory duty to revoke within 5 days. Failure to do so puts your company at risk of continual operation in this space, and the continued claims of ignorance of your Subscribers is not a shield it is emblematic of your continued failure to perform your duties.

While there may not be an immediate danger to human life, a lot of our subscribers operate critical infrastructure. If we were to revoke and they were not able to immediately re-issue, there would be significant disruption to the web ecosystem.

What is being done in the immediate term to make sure this will never happen again? What was done in the previous incidents to make sure this will never happen again, and why did all of those steps fail despite every promise being made?

We have assessed the root cause of this and the other mis-issuance events and developed a robust remediation plan to address each root cause. This information will be laid out in detail in the forthcoming report to the community. Examples include redundancies in reviewing and interpreting changes in requirements; more robust change control procedures; tightened incident response procedures, and, to your earlier point, more education for our subscribers. Addressing the root cause of what led to each mis-issuance in the first place is within our control and we feel confident that the remediation plan we will be proposing will reduce the risk of future mis-issuances.

(In reply to Ben Wilson from comment #26)

I agree with Comment #20, Comment #21, and Comment #22 that root programs want to encourage reporting for transparency purposes, not discourage it, and that it is not the quantity of incident reports, but the quality of the incident reports that we care about.

We understand and agree with your comment, Ben.

(In reply to Tim Callan from comment #27)

(In reply to Paul van Brouwershaven from comment #24)

  • 359 of 1,176 (30.5%) certificates have been revoked or expired.
    • 346 revoked and not expired
    • 13 expired (before or after revocation)

Paul,

First of all, thank you for adding this information. It helps transparency and should be a required practice for all late revocation events. I encourage other CAs to follow this example. I look forward to a similar breakdown in your future reports for bug 1886532, bug 1890685, and bug 1890898.

Please help me understand what I’m reading here. When you say.

  • 13 expired (before or after revocation)

…I want to be sure I’m interpreting “after revocation” correctly. Are you saying that some of these 13 expired on their own while others are now beyond the notAfter date but that Entrust revoked them prior to expiration?

If so, I don’t think it’s important to track when they would have expired. The key point is did the CA revoke them or just let them expire on their own. It is more useful to the community to divide the certificates into those that were revoked by the CA and those that were active until expiration. Two simple, unambiguous numbers.

Thank you for this suggestion. The data did not clarify if these certificates were revoked before they expired or not, we agree this is valuable information and will ensure we include this distinction in our future updates.

Here is our progress:

  • 412 of 1,176 (35.0%) certificates have been revoked or expired.
    • 409 revoked before expiration
    • 3 expired before revocation
  • 0 certificates have been re-issued with revocation pending.
  • 112 out of 114 customer accounts (98.2%) have fully remediated the issue (certificates re-issued and old certificates revoked or expired).

Here is the information per customer, we are working with each customer to accelerate the time to revocation.

  • Subscriber 1 (see comment 24 above) has 759 of 872 certificates remaining with an estimated date of completion - May 30, 2024.
  • Subscriber 4 (see comment 24 above) has 5 of 5 certificates remaining with an estimated date of completion – May 30, 2024.

(In reply to Paul van Brouwershaven from comment #28)

While there may not be an immediate danger to human life, a lot of our subscribers operate critical infrastructure. If we were to revoke and they were not able to immediately re-issue, there would be significant disruption to the web ecosystem.

How do you define “the web ecosystem” here, and what would the disruption be? What aspects of the web beyond the specific sites operated by the subscriber would be affected?

I think it’s important to distinguish “disruption to subscriber’s business and inconvenience for subscriber’s customers” from “disruption to the web ecosystem, because this service is one that will have intolerable cascading effects on other aspects of the web”. The web is very resilient, and diverse. If someone can’t book a flight on Airline A because that site is down, then they will use Airline B. (If they’ve already got a ticket, they will check in at the airport, or call the service line for changes, etc. If my bank’s web site is down, which happens, I can call or go to a branch.) We’ve all encountered times when the web sites of important businesses are broken, and it’s very rare that it has systematic effects.

One way to think about it is: “how would the broader web be impaired if this company went out of business tomorrow?” I would expect to hear about things like major cloud service providers being inoperable, peering arrangements to be busted, nobody (versus the customers of one vendor) being able to update DNS records, or a browser’s update function breaking.

Each of these subscribers have chosen to take the risk of disruption if their certificate is revoked or otherwise lost. They have decided not to invest in automation or process agility that would let them move to another certificate promptly (5 days! that’s a long time!), in favour of some other business investment. That is a perfectly fine decision for them to make, but it is unacceptable for Entrust to unilaterally decide to shift that risk to the ecosystem, especially given previous promises.

(In reply to Wayne from comment #25)

What is being done in the immediate term to make sure this will never happen again? What was done in the previous incidents to make sure this will never happen again, and why did all of those steps fail despite every promise being made?

We have assessed the root cause of this and the other mis-issuance events and developed a robust remediation plan to address each root cause. This information will be laid out in detail in the forthcoming report to the community. Examples include redundancies in reviewing and interpreting changes in requirements; more robust change control procedures; tightened incident response procedures, and, to your earlier point, more education for our subscribers. Addressing the root cause of what led to each mis-issuance in the first place is within our control and we feel confident that the remediation plan we will be proposing will reduce the risk of future mis-issuances.

I think you are answering a different question from the one that was asked, though I don’t blame you for mixing up the many open incidents.

What you need to answer is not about misissuance, but about the topic of this incident, delayed revocation.

What are you doing to make sure that if Entrust misissues a bank’s certificate in July, we won’t see another delayed revocation? You say it yourself, I think exactly correctly: the measures taken against misissuance reduce the risk, they do not eliminate it. The BRs and WebPKI in general understand that misissuance is a virtual certainty over the long term, which is why the BRs have so much content about what is expected of a CA when they have misissued certificates. Entrust has chronically not met those expectations, and it’s the path to meeting them in the future that needs to be clear if Entrust is to remain allowed to issue WebPKI certificates that are relied upon by billions of humans and machines.

Additionally, it is important for you to explain what was done the last time Entrust promised not to delay revocation again, four years ago. Why did it not work? What is going to be different this time?

Flags: needinfo?(paul.vanbrouwershaven)

(In reply to Mike Shaver from comment #32)

(In reply to Paul van Brouwershaven from comment #28)

While there may not be an immediate danger to human life, a lot of our subscribers operate critical infrastructure. If we were to revoke and they were not able to immediately re-issue, there would be significant disruption to the web ecosystem.

How do you define “the web ecosystem” here, and what would the disruption be? What aspects of the web beyond the specific sites operated by the subscriber would be affected?

Entrust defines disruption of the web ecosystem as involving critical infrastructure that is impactful to the economy or society. This would include effects on a vast subscriber customer-base, especially when there is significant impact to relying parties.

It might appear that customers of a subscriber can easily move from one service to another. The disruption is more significant as most of these subscribers are dealing with thousands or millions of consumers who have financial information located with a specific provider or have tickets with a certain airline, or have other financial transaction that are held with a subscriber. This potential disruption is further compounded when these subscribers have a large number of certificates.

Entrust does not want to make unilateral decisions to choose which subscribers get an extension and which subscribers do not. We are very much asking for assistance to seek a solution that will successfully address your concerns.

I think it’s important to distinguish “disruption to subscriber’s business and inconvenience for subscriber’s customers” from “disruption to the web ecosystem, because this service is one that will have intolerable cascading effects on other aspects of the web”. The web is very resilient, and diverse. If someone can’t book a flight on Airline A because that site is down, then they will use Airline B. (If they’ve already got a ticket, they will check in at the airport, or call the service line for changes, etc. If my bank’s web site is down, which happens, I can call or go to a branch.) We’ve all encountered times when the web sites of important businesses are broken, and it’s very rare that it has systematic effects.

One way to think about it is: “how would the broader web be impaired if this company went out of business tomorrow?” I would expect to hear about things like major cloud service providers being inoperable, peering arrangements to be busted, nobody (versus the customers of one vendor) being able to update DNS records, or a browser’s update function breaking.

Each of these subscribers have chosen to take the risk of disruption if their certificate is revoked or otherwise lost. They have decided not to invest in automation or process agility that would let them move to another certificate promptly (5 days! that’s a long time!), in favour of some other business investment. That is a perfectly fine decision for them to make, but it is unacceptable for Entrust to unilaterally decide to shift that risk to the ecosystem, especially given previous promises.

(In reply to Wayne from comment #25)

What is being done in the immediate term to make sure this will never happen again? What was done in the previous incidents to make sure this will never happen again, and why did all of those steps fail despite every promise being made?

For the immediate term, we have implemented PKIlint for post linting to detect mis-issued certificates. We have also been proactively offering these subscribers several free and paid options for automation and offering alternatives like private PKI. We are also considering investing in contributing to open source linting tools and process tracker tools for verification and issuance governance.

In previous instances, we made commitments but believed that our existing processes and policies would be sufficient to ensure that we could largely avoid delayed revocation in a future instance. We conducted webinars, subscriber/partner events, private meetings, about the need for crypto agility and how security vulnerabilities or other events require subscribers to revoke and replace certificates quickly. We also spent significant development resources creating integration tools to the major CLM providers, our own internal solutions, ACME and ticketing systems like ServiceNow. We encourage all our customers to use these systems and provide many of these services free of charge. In hindsight, we did not anticipate the challenges presented in these instances of mis-issuance nor appropriately prepare our customers to handle rapid revocation and reissuance

What can be done in the future? Along with avoiding mis-issuance, more automation will be helpful, but we also should work on tracking the practice runs. These subscribers are optimized for natural expiration. However, they are out of practice for unplanned rapid re-issuance events. We believe a combination of more automation tools and performing practice runs as a CA and as an industry will help significantly. Additionally, we are educating subscribers of their 1 day and 5 days responsibilities, and requirement to perform this action at various interaction points, e.g. revocation events, contract renewals, etc.

We have assessed the root cause of this and the other mis-issuance events and developed a robust remediation plan to address each root cause. This information will be laid out in detail in the forthcoming report to the community. Examples include redundancies in reviewing and interpreting changes in requirements; more robust change control procedures; tightened incident response procedures, and, to your earlier point, more education for our subscribers. Addressing the root cause of what led to each mis-issuance in the first place is within our control and we feel confident that the remediation plan we will be proposing will reduce the risk of future mis-issuances.

I think you are answering a different question from the one that was asked, though I don’t blame you for mixing up the many open incidents.
What you need to answer is not about mis-issuance, but about the topic of this incident, delayed revocation.

We are committed to meeting the standards and avoiding delayed revocation, and we are planning to educate our customers and shape our policies and organization structure as described above to meet that standard.

What are you doing to make sure that if Entrust mis-issues a bank’s certificate in July, we won’t see another delayed revocation? You say it yourself, I think exactly correctly: the measures taken against mis-issuance reduce the risk, they do not eliminate it. The BRs and WebPKI in general understand that mis-issuance is a virtual certainty over the long term, which is why the BRs have so much content about what is expected of a CA when they have mis-issued certificates. Entrust has chronically not met those expectations, and it’s the path to meeting them in the future that needs to be clear if Entrust is to remain allowed to issue WebPKI certificates that are relied upon by billions of humans and machines.

We tried to answer this question above. You’re right we can’t stop all possibilities of a mis-issuance, although we are committed to continue to invest in subscriber education, automation process improvement and internal training. We need to measure and target customers that need the additional help. Both the increase in automation and offering tabletop exercises are good next steps.

Additionally, it is important for you to explain what was done the last time Entrust promised not to delay revocation again, four years ago. Why did it not work? What is going to be different this time?

We believe we answered this above.

Here is our progress:

  • 540 of 1,176 (45.9%) certificates have been revoked or expired.
  • 537 revoked before expiration
  • 3 expired before revocation
  • 0 certificates have been re-issued with revocation pending.
  • 112 out of 114 customer accounts (98.2%) have fully remediated the issue (certificates re-issued and old certificates revoked or expired).

Here is the information per customer, we are working with each customer to accelerate the time to revocation.

  • Subscriber 1 (see comment 24 above) has 628 of 872 certificates remaining with an estimated date of completion - May 30, 2024.
  • Subscriber 4 (see comment 24 above) has 5 of 5 certificates remaining with an estimated date of completion – May 30, 2024.

(In reply to ngook.kong from comment #33)

Entrust defines disruption of the web ecosystem as involving critical infrastructure that is impactful to the economy or society. This would include effects on a vast subscriber customer-base, especially when there is significant impact to relying parties.

It might appear that customers of a subscriber can easily move from one service to another. The disruption is more significant as most of these subscribers are dealing with thousands or millions of consumers who have financial information located with a specific provider or have tickets with a certain airline, or have other financial transaction that are held with a subscriber. This potential disruption is further compounded when these subscribers have a large number of certificates.

Entrust does not want to make unilateral decisions to choose which subscribers get an extension and which subscribers do not. We are very much asking for assistance to seek a solution that will successfully address your concerns.

It is not Entrust's role as a Certificate Authority to manage risk for their subscribers. Your job is to revoke within the required deadlines, not to mull over excuses by your subscribers. Can you give an example of 3 subscribers who have detailed issues revoking in writing to this extent? Entrust have already detailed making unilateral decisions on who gets an extension, so I have no idea why that statement is there.

The solution is simple: Revoke within the required deadline. It is your role as a Certificate Authority.

For the immediate term, we have implemented PKIlint for post linting to detect mis-issued certificates. We have also been proactively offering these subscribers several free and paid options for automation and offering alternatives like private PKI. We are also considering investing in contributing to open source linting tools and process tracker tools for verification and issuance governance.

In previous instances, we made commitments but believed that our existing processes and policies would be sufficient to ensure that we could largely avoid delayed revocation in a future instance. We conducted webinars, subscriber/partner events, private meetings, about the need for crypto agility and how security vulnerabilities or other events require subscribers to revoke and replace certificates quickly. We also spent significant development resources creating integration tools to the major CLM providers, our own internal solutions, ACME and ticketing systems like ServiceNow. We encourage all our customers to use these systems and provide many of these services free of charge. In hindsight, we did not anticipate the challenges presented in these instances of mis-issuance nor appropriately prepare our customers to handle rapid revocation and reissuance

Okay so there is nothing different now than when you previously made commitments to improve. We have nothing of substance to guarantee you will be able to handle a revocation without delay going forward. A lack of anticipating challenges is showing a systemic failure in handling incident response that has shown no signs of changing.

What can be done in the future? Along with avoiding mis-issuance, more automation will be helpful, but we also should work on tracking the practice runs. These subscribers are optimized for natural expiration. However, they are out of practice for unplanned rapid re-issuance events. We believe a combination of more automation tools and performing practice runs as a CA and as an industry will help significantly. Additionally, we are educating subscribers of their 1 day and 5 days responsibilities, and requirement to perform this action at various interaction points, e.g. revocation events, contract renewals, etc.

The subscribers are 'optimized' for natural expiration as you are not performing your duties. It is also NOT THE SUBSCRIBER'S RESPONSIBILITY to do 1 day and 5 day revocation - it is your core responsibility. Statements to this effect show a deep misunderstanding of the situation and this is months into having every requirement quoted back at Entrust who state they are fully aware of the situation.

We are committed to meeting the standards and avoiding delayed revocation, and we are planning to educate our customers and shape our policies and organization structure as described above to meet that standard.

Your actions speak otherwise, revoke by the deadline. Any plans you are considering should have already been here as detailed action items, nevermind the years of commitments to improve that have resulted in these empty statements.

We tried to answer this question above. You’re right we can’t stop all possibilities of a mis-issuance, although we are committed to continue to invest in subscriber education, automation process improvement and internal training. We need to measure and target customers that need the additional help. Both the increase in automation and offering tabletop exercises are good next steps.

A good next step is revocation within the required deadlines. Any other statement is not acknowledging your role and effectively stating you are intending to do this in the future, again.

Additionally, it is important for you to explain what was done the last time Entrust promised not to delay revocation again, four years ago. Why did it not work? What is going to be different this time?

We believe we answered this above.

You have not answered that above. What went wrong between the promises to not delay revocation over 4 years ago and now? Entrust is still failing to do their core duties and blaming their subscribers.

(In reply to ngook.kong from comment #34)

Here is our progress:

  • 540 of 1,176 (45.9%) certificates have been revoked or expired.
  • 537 revoked before expiration
  • 3 expired before revocation
  • 0 certificates have been re-issued with revocation pending.
  • 112 out of 114 customer accounts (98.2%) have fully remediated the issue (certificates re-issued and old certificates revoked or expired).

Here is the information per customer, we are working with each customer to accelerate the time to revocation.

  • Subscriber 1 (see comment 24 above) has 628 of 872 certificates remaining with an estimated date of completion - May 30, 2024.
  • Subscriber 4 (see comment 24 above) has 5 of 5 certificates remaining with an estimated date of completion – May 30, 2024.

Hold on, are you stating that the 636 certificates remaining have not even been issued yet? At this moment in time there is no activity on fixing these other than waiting for your last two subscribers to perhaps deal with this within 48 hours after months of delays? Your per-subscriber breakdown also only adds up to 635...

I notice you used this statement in last week's report too, are you leaving the subscribers to handle requesting a certificate by their own means and not giving them one and telling them their current one will be revoked in x days?

Flags: needinfo?(ngook.kong)

Here is the information per customer, we are working with each customer to accelerate the time to revocation.

  • Subscriber 1 (see comment 24 above) has 628 of 872 certificates remaining with an estimated date of completion - May 30, 2024.
  • Subscriber 4 (see comment 24 above) has 5 of 5 certificates remaining with an estimated date of completion – May 30, 2024.

I can see the certificates were successfully revoked by this timeframe. Overall I can see over the 1067 certificates revoked since 2024-03-21 578 were yesterday (05-30) - 54%...

I am looking forward to seeing the report on this incident, it very much seems as if something changed in the past week. I am still unclear on what was stopping the revocation in the first 5 days but we will find out.

As of May 30 at 16:00 UTC: 1176 of 1176 certificates (100%) have been revoked or expired. 114 out of 114 customer accounts have fully remediated the issue (certificates re-issued and old certificates revoked or expired.

Flags: needinfo?(ngook.kong)

I'm seeing the last revocations were:
2024-05-30 19:29:27 UTC
2024-05-30 19:25:49 UTC

But yes, they are all revoked.

(In reply to Wayne from comment #38)

Hi Wayne, you are correct. This should have been called out as 19:29 UTC.

(In reply to ngook.kong from bug 1886532 comment #36)

We have no examples of a case in which we revoked without a response from the subscriber. Our Support teams worked 24/7 to ensure that every subscriber was contacted and responded, and we received responses from all our Subscribers by the conclusion of that effort.

. . .

We start with a presumption of denial, but there are circumstances that justify some period of delay.

Are these statements true of this bug also?

(In reply to Wayne from comment #35)

(In reply to ngook.kong from comment #33)
Can you give an example of 3 subscribers who have detailed issues revoking in writing to this extent?

We understand and acknowledge your desire for concrete examples. However, due to confidentiality agreements with our subscribers, we are not allowed to disclose specific details of correspondence with specific subscribers. We have listed a high-level overview of the reasons provided.

The subscribers are 'optimized' for natural expiration as you are not performing your duties. It is also NOT THE SUBSCRIBER'S RESPONSIBILITY to do 1 day and 5 day revocation - it is your core responsibility. Statements to this effect show a deep misunderstanding of the situation and this is months into having every requirement quoted back at Entrust who state they are fully aware of the situation.

We do not believe that it is accurate to say that a CA’s “core responsibility” is limited to a single function (revoking certificates). We do agree that compliance with the applicable industry standards is of central importance – we are committed to meeting the standards and avoiding delayed revocation.

We are committed to meeting the standards and avoiding delayed revocation, and we are planning to educate our customers and shape our policies and organization structure as described above to meet that standard.

Your actions speak otherwise, revoke by the deadline. Any plans you are considering should have already been here as detailed action items, never mind the years of commitments to improve that have resulted in these empty statements.

You have not answered that above. What went wrong between the promises to not delay revocation over 4 years ago and now? Entrust is still failing to do their core duties and blaming their subscribers.

As noted above in Comment 33:

“In previous instances, we made commitments but believed that our existing processes and policies would be sufficient to ensure that we could largely avoid delayed revocation in a future instance … In hindsight, we did not anticipate the challenges presented in these instances of mis-issuance nor appropriately prepare our customers to handle rapid revocation and reissuance”

I would add that we are putting concrete plans into motion to address root causes of recent issues and improve operations, and that these efforts are backed by our product, compliance, development, legal, operations, support and executive leadership teams. The web ecosystem is more resilient when our subscribers are supported with people, processes and tools that enable 24-hour or five-day revocation should the need arise.

(In reply to ngook.kong from comment #34)

Here is our progress:

  • 540 of 1,176 (45.9%) certificates have been revoked or expired.
  • 537 revoked before expiration
  • 3 expired before revocation
  • 0 certificates have been re-issued with revocation pending.
  • 112 out of 114 customer accounts (98.2%) have fully remediated the issue (certificates re-issued and old certificates revoked or expired).

Here is the information per customer, we are working with each customer to accelerate the time to revocation.

  • Subscriber 1 (see comment 24 above) has 628 of 872 certificates remaining with an estimated date of completion - May 30, 2024.
  • Subscriber 4 (see comment 24 above) has 5 of 5 certificates remaining with an estimated date of completion – May 30, 2024.

Hold on, are you stating that the 636 certificates remaining have not even been issued yet?

If your question relates to the line “0 certificates have been re-issued with revocation pending”, in some cases, new certificates get issued instead of existing certificates re-issued. Therefore these certificates cannot be directly linked to existing certificates, which means that these are not included in the number of certificates that have been re-issued with revocation pending.

At this moment in time there is no activity on fixing these other than waiting for your last two subscribers to perhaps deal with this within 48 hours after months of delays?

We worked closely with the subscribers and appreciate the hard work and long hours that many individuals put in to get the certificates revoked by the estimated May 30th date, which has now been completed.

I notice you used this statement in last week's report too, are you leaving the subscribers to handle requesting a certificate by their own means and not giving them one and telling them their current one will be revoked in x days?

Again, if this question relates to the line “0 certificates have been re-issued with revocation pending”, see our response above.

However, due to confidentiality agreements with our subscribers, we are not allowed to disclose specific details of correspondence with specific subscribers. We have listed a high-level overview of the reasons provided.

I don't think this is appropriate. You're breaking one of the core rules of being a CA, and then saying sorry we're keeping that information confidential. It seems like there's a disconnect here of understanding that the role of a CA is to be a steward of public trust, not a steward of NDAs.

We do not believe that it is accurate to say that a CA’s “core responsibility” is limited to a single function (revoking certificates).

CA's can, and do have multiple "core responsibilities". Managing the lifecycle of a certificate is one of those responsibilities. Included in the lifecycle management of a certificate is revoking a certificate that is mis-issued according to the rules set out in the BRs.

in some cases, new certificates get issued instead of existing certificates re-issued.

What does this practically mean? A re-issuance is still a new issuance so I'm having a hard time understanding this.

(In reply to ngook.kong from comment #41)

I would add that we are putting concrete plans into motion to address root causes of recent issues and improve operations, and that these efforts are backed by our product, compliance, development, legal, operations, support and executive leadership teams. The web ecosystem is more resilient when our subscribers are supported with people, processes and tools that enable 24-hour or five-day revocation should the need arise.

Will you be tracking any of these in this ticket as action items as required for an incident report, and if not, where should the community look to see your concrete plans and follow their implementations?

Can you give an example of 3 subscribers who have detailed issues revoking in writing to this extent?
We understand and acknowledge your desire for concrete examples. However, due to confidentiality agreements with our subscribers, we are not allowed to disclose specific details of correspondence with specific subscribers. We have listed a high-level overview of the reasons provided.

Okay so Entrust cannot provide any examples that fit their narrative. Relying on 'confidentiality agreements' to refuse addressing compliance issues is not looked kindly upon in any other industry. Can Entrust even provide any boilerplate 'confidentiality agreement' that covers the scope of this request?

To further clarify Entrust stated the below in Comment 33:

Entrust defines disruption of the web ecosystem as involving critical infrastructure that is impactful to the economy or society. This would include effects on a vast subscriber customer-base, especially when there is significant impact to relying parties.

It might appear that customers of a subscriber can easily move from one service to another. The disruption is more significant as most of these subscribers are dealing with thousands or millions of consumers who have financial information located with a specific provider or have tickets with a certain airline, or have other financial transaction that are held with a subscriber. This potential disruption is further compounded when these subscribers have a large number of certificates.

Entrust does not want to make unilateral decisions to choose which subscribers get an extension and which subscribers do not. We are very much asking for assistance to seek a solution that will successfully address your concerns.

Yet cannot back up any statements that these hypothetical scenarios were actually encountered by their subscribers, nor that the subscribers even provided statements to this effect.

Please note at no point was identification of the subscribers even asked, merely 3 written reasons that fit the defined narrative and we have already had written reasons by subscribers posted by Entrust in this very incident nullifying any 'confidentiality agreements' alleged to be stopping Entrust from being transparent.

We do not believe that it is accurate to say that a CA’s “core responsibility” is limited to a single function (revoking certificates). We do agree that compliance with the applicable industry standards is of central importance – we are committed to meeting the standards and avoiding delayed revocation.

Alignment with the baseline requirements of which revocation is a security function is a CA's core responsibility. Entrust's failure to address this or combat it across the past several months make it clear where their priorities lie. To date we have had no clear statement by Entrust that delayed revocation will not happen again, as they are organisationally incapable of committing to this basic security guarantee.

I would add that we are putting concrete plans into motion to address root causes of recent issues and improve operations, and that these efforts are backed by our product, compliance, development, legal, operations, support and executive leadership teams. The web ecosystem is more resilient when our subscribers are supported with people, processes and tools that enable 24-hour or five-day revocation should the need arise.

All this statement is providing is empty words. Entrust already claimed to improve 4 years ago:

  • We will not the make the decision not to revoke.
  • We will plan to revoke within the 24 hours or 5 days as applicable for the incident.
  • We will provide notice to our customers of our obligations to revoke and recommend action within 24 hours or 5 days based on the BR requirements.
  • We will recommend to our customers to implement automation of certificate management.
  • We will increase our ability for correct implementation and testing to ensure that certificate profiles will meet the latest CA/Browser Forum or root program requirements.
  • We will monitor the Mozilla incidents and the discussion list to discover problems which other CAs have experienced and how they were resolved. This will allow us to review and react if required to our own implementation. This will also help to minimize the number of miss-issued certificates, which will reduce the risk of late revocation.
  • We will manage and update our pre-issuance and post-issuance linting to discover or prevent the problem early.

What can you substantially show has changed in this timeframe in regards these issues on a point-by-point basis? To date we still have certificates given 'deadlines' that are breezed past and further 'deadlines' are rubberstamped by Entrust with no regards to the integrity of their trust. Revoke the certificates by the deadline, it is not a hard problem to solve.

Further Entrust still have not acknowledged that revocation is not the subscriber's responsiblity: it is Entrust's responsibility to handle before the deadlines. They needed to change in this approach months ago, instead we have commitments to non-changes that further emphasis a lack of responsability and a belief that they do not need to be held to the same standards as other CAs.

If your question relates to the line “0 certificates have been re-issued with revocation pending”, in some cases, new certificates get issued instead of existing certificates re-issued. Therefore these certificates cannot be directly linked to existing certificates, which means that these are not included in the number of certificates that have been re-issued with revocation pending.

So new certificates have been issued to resolve non-compliant certificates, but unannounced despite the fact those would render the mis-issued certificate null and void?

We worked closely with the subscribers and appreciate the hard work and long hours that many individuals put in to get the certificates revoked by the estimated May 30th date, which has now been completed.
Again, if this question relates to the line “0 certificates have been re-issued with revocation pending”, see our response above.

I would not be celebrating taking 72 days to handle a 5 day revocation event. This is an abject failure and reflects a need to re-address your incident response and personnel top-to-bottom in order to show that Entrust are capable of being a respectable member of the WebPKI community.

Flags: needinfo?(ngook.kong)

(In reply to Tim Callan from comment #40)

(In reply to ngook.kong from bug 1886532 comment #36)

We have no examples of a case in which we revoked without a response from the subscriber. Our Support teams worked 24/7 to ensure that every subscriber was contacted and responded, and we received responses from all our Subscribers by the conclusion of that effort.

. . .

We start with a presumption of denial, but there are circumstances that justify some period of delay.

Are these statements true of this bug also?

Yes, these statements are also true of this bug.

(In reply to amir from comment #42)

However, due to confidentiality agreements with our subscribers, we are not allowed to disclose specific details of correspondence with specific subscribers. We have listed a high-level overview of the reasons provided.

I don't think this is appropriate. You're breaking one of the core rules of being a CA, and then saying sorry we're keeping that information confidential. It seems like there's a disconnect here of understanding that the role of a CA is to be a steward of public trust, not a steward of NDAs.

We do not believe that it is accurate to say that a CA’s “core responsibility” is limited to a single function (revoking certificates).

CA's can, and do have multiple "core responsibilities". Managing the lifecycle of a certificate is one of those responsibilities. Included in the lifecycle management of a certificate is revoking a certificate that is mis-issued according to the rules set out in the BRs.

We agree with this more nuanced description of a CA’s responsibilities.

in some cases, new certificates get issued instead of existing certificates re-issued.

What does this practically mean? A re-issuance is still a new issuance so I'm having a hard time understanding this.

A re-issuance is a process that uses the same CSR and public key that was included in the original certificate request submitted by the subscriber. Alternatively, subscribers can choose to request a new certificate using a new CSR and with a new public key instead.

(In reply to macy from comment #43)

(In reply to ngook.kong from comment #41)

I would add that we are putting concrete plans into motion to address root causes of recent issues and improve operations, and that these efforts are backed by our product, compliance, development, legal, operations, support and executive leadership teams. The web ecosystem is more resilient when our subscribers are supported with people, processes and tools that enable 24-hour or five-day revocation should the need arise.

Will you be tracking any of these in this ticket as action items as required for an incident report, and if not, where should the community look to see your concrete plans and follow their implementations?

The paragraph above refers to our more holistic plans “to address root causes of recent issues and improve operations”, which are not necessarily unique or specific to this bug. We have created a detailed report addressing this and other incidents, including specific action items to implement our remediation plans. This report can be found https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/LhTIUMFGHNw.

(In reply to Wayne from comment #44)

Can you give an example of 3 subscribers who have detailed issues revoking in writing to this extent?
We understand and acknowledge your desire for concrete examples. However, due to confidentiality agreements with our subscribers, we are not allowed to disclose specific details of correspondence with specific subscribers. We have listed a high-level overview of the reasons provided.

Okay so Entrust cannot provide any examples that fit their narrative. Relying on 'confidentiality agreements' to refuse addressing compliance issues is not looked kindly upon in any other industry. Can Entrust even provide any boilerplate 'confidentiality agreement' that covers the scope of this request?

To further clarify Entrust stated the below in Comment 33:

Entrust defines disruption of the web ecosystem as involving critical infrastructure that is impactful to the economy or society. This would include effects on a vast subscriber customer-base, especially when there is significant impact to relying parties.

It might appear that customers of a subscriber can easily move from one service to another. The disruption is more significant as most of these subscribers are dealing with thousands or millions of consumers who have financial information located with a specific provider or have tickets with a certain airline, or have other financial transaction that are held with a subscriber. This potential disruption is further compounded when these subscribers have a large number of certificates.

Entrust does not want to make unilateral decisions to choose which subscribers get an extension and which subscribers do not. We are very much asking for assistance to seek a solution that will successfully address your concerns.

Yet cannot back up any statements that these hypothetical scenarios were actually encountered by their subscribers, nor that the subscribers even provided statements to this effect.

Please note at no point was identification of the subscribers even asked, merely 3 written reasons that fit the defined narrative and we have already had written reasons by subscribers posted by Entrust in this very incident nullifying any 'confidentiality agreements' alleged to be stopping Entrust from being transparent.

Confidentiality agreements often rely on a party exercising its reasonable judgment to understand whether information disclosed by one party would reasonably be understood to be confidential by the receiving party. There are often exceptions where disclosure can be made to comply with specific external obligations such as cooperation with law enforcement. In this case, we are exercising our reasonable judgment about what information disclosed by our subscribers should be treated as confidential, and what may be shared with the community to comply with public trust obligations.

We do not believe that it is accurate to say that a CA’s “core responsibility” is limited to a single function (revoking certificates). We do agree that compliance with the applicable industry standards is of central importance – we are committed to meeting the standards and avoiding delayed revocation.

Alignment with the baseline requirements of which revocation is a security function is a CA's core responsibility. Entrust's failure to address this or combat it across the past several months make it clear where their priorities lie. To date we have had no clear statement by Entrust that delayed revocation will not happen again, as they are organisationally incapable of committing to this basic security guarantee.

I would add that we are putting concrete plans into motion to address root causes of recent issues and improve operations, and that these efforts are backed by our product, compliance, development, legal, operations, support and executive leadership teams. The web ecosystem is more resilient when our subscribers are supported with people, processes and tools that enable 24-hour or five-day revocation should the need arise.

All this statement is providing is empty words. Entrust already claimed to improve 4 years ago:

  • We will not the make the decision not to revoke.
  • We will plan to revoke within the 24 hours or 5 days as applicable for the incident.
  • We will provide notice to our customers of our obligations to revoke and recommend action within 24 hours or 5 days based on the BR requirements.
  • We will recommend to our customers to implement automation of certificate management.
  • We will increase our ability for correct implementation and testing to ensure that certificate profiles will meet the latest CA/Browser Forum or root program requirements.
  • We will monitor the Mozilla incidents and the discussion list to discover problems which other CAs have experienced and how they were resolved. This will allow us to review and react if required to our own implementation. This will also help to minimize the number of miss-issued certificates, which will reduce the risk of late revocation.
  • We will manage and update our pre-issuance and post-issuance linting to discover or prevent the problem early.

What can you substantially show has changed in this timeframe in regards these issues on a point-by-point basis? To date we still have certificates given 'deadlines' that are breezed past and further 'deadlines' are rubberstamped by Entrust with no regards to the integrity of their trust. Revoke the certificates by the deadline, it is not a hard problem to solve.

You may have intended this as a rhetorical question to emphasize the opinion that insufficient changes have been made, and we agree and acknowledge. We agree that insufficient changes were made over the last four years and are committed to making more extensive changes. Action items relating to this commitment are provided in Bug 1901270.

Further Entrust still have not acknowledged that revocation is not the subscriber's responsiblity: it is Entrust's responsibility to handle before the deadlines. They needed to change in this approach months ago, instead we have commitments to non-changes that further emphasis a lack of responsability and a belief that they do not need to be held to the same standards as other CAs.

We respectfully disagree with the opinion stated here that our commitments are to “non-changes” and regarding our belief about being held to a different standard. We believe that all CAs should be held to the CA/B Forum requirements in an equal manner. The existence of delayed revocation is not unique to Entrust and was even a topic of discussion at the most recent CA/B Forum in-person meeting in Italy.

If your question relates to the line “0 certificates have been re-issued with revocation pending”, in some cases, new certificates get issued instead of existing certificates re-issued. Therefore these certificates cannot be directly linked to existing certificates, which means that these are not included in the number of certificates that have been re-issued with revocation pending.

So new certificates have been issued to resolve non-compliant certificates, but unannounced despite the fact those would render the mis-issued certificate null and void?

Because these new certificates are not directly linked to the old certificates, we would have no reliable mechanism to include them in the report as “new certificate issued, but old certificate not revoked yet.”

(In reply to Wayne from comment #35)

Hold on, are you stating that the 636 certificates remaining have not even been issued yet? At this moment in time there is no activity on fixing these other than waiting for your last two subscribers to perhaps deal with this within 48 hours after months of delays? Your per-subscriber breakdown also only adds up to 635...

We have re-checked, and the number of certificates that had been revoked or expired at the time of Comment 34 was 543, not 540. This left 633 unrevoked at the time, which corresponded to our per-subscriber breakdown of 633 (628 for Subscriber 1 and 5 for Subscriber 4).

As of May 30, 2024, all outstanding certificates in this bug were revoked.

I notice you used this statement in last week's report too, are you leaving the subscribers to handle requesting a certificate by their own means and not giving them one and telling them their current one will be revoked in x days?

Perhaps we did not understand the questions or underlying concern correctly. Has the concern been addressed given that all were revoked by the dates given in the Comment 34 progress update? If not, could you please clarify the question/concern?

(In reply to Bruce Morton from comment #49)

(In reply to Wayne from comment #35)

Hold on, are you stating that the 636 certificates remaining have not even been issued yet? At this moment in time there is no activity on fixing these other than waiting for your last two subscribers to perhaps deal with this within 48 hours after months of delays? Your per-subscriber breakdown also only adds up to 635...

We have re-checked, and the number of certificates that had been revoked or expired at the time of Comment 34 was 543, not 540. This left 633 unrevoked at the time, which corresponded to our per-subscriber breakdown of 633 (628 for Subscriber 1 and 5 for Subscriber 4).

As of May 30, 2024, all outstanding certificates in this bug were revoked.

I notice you used this statement in last week's report too, are you leaving the subscribers to handle requesting a certificate by their own means and not giving them one and telling them their current one will be revoked in x days?

Perhaps we did not understand the questions or underlying concern correctly. Has the concern been addressed given that all were revoked by the dates given in the Comment 34 progress update? If not, could you please clarify the question/concern?

This question was already addressed in Comment 41.

(In reply to Bruce Morton from comment #48)

Confidentiality agreements often rely on a party exercising its reasonable judgment to understand whether information disclosed by one party would reasonably be understood to be confidential by the receiving party. There are often exceptions where disclosure can be made to comply with specific external obligations such as cooperation with law enforcement. In this case, we are exercising our reasonable judgment about what information disclosed by our subscribers should be treated as confidential, and what may be shared with the community to comply with public trust obligations.

So to answer my question as posed:

Can Entrust even provide any boilerplate 'confidentiality agreement' that covers the scope of this request?

No. Furthermore no transparency can be provided to back up claims made in this incident. I presume based off of your interpretation of confidential information and reasonable judgment that no such specific stipulations actually exist, and this is a pro-active defense on Entrust's behalf?

To further clarify Entrust stated the below in Comment 33:

Entrust defines disruption of the web ecosystem as involving critical infrastructure that is impactful to the economy or society. This would include effects on a vast subscriber customer-base, especially when there is significant impact to relying parties.

It might appear that customers of a subscriber can easily move from one service to another. The disruption is more significant as most of these subscribers are dealing with thousands or millions of consumers who have financial information located with a specific provider or have tickets with a certain airline, or have other financial transaction that are held with a subscriber. This potential disruption is further compounded when these subscribers have a large number of certificates.

Entrust does not want to make unilateral decisions to choose which subscribers get an extension and which subscribers do not. We are very much asking for assistance to seek a solution that will successfully address your concerns.

As already asked, can you state if any confidentiality agreements exists with your subscribers that cover providing the above information? Explicitly, or implicitly?

So far the statements provided are all hypothetical. Not only that any and all claims of not wanting to make unilateral decisions are irrelevant. In a related incident Entrust already attested to doing such a thing, see: #1886532:

  1. We decided not to revoke if we did not see revocation progress, and/or no confirmed contact (either an email reply or talking to someone directly). The intent was to avoid unintended harmful consequences. As expressed before, improving this process is a key learning and area of improvement we are committed to and welcome your feedback accordingly.

You may have intended this as a rhetorical question to emphasize the opinion that insufficient changes have been made, and we agree and acknowledge. We agree that insufficient changes were made over the last four years and are committed to making more extensive changes. Action items relating to this commitment are provided in Bug 1901270.

Please re-read my question:

What can you substantially show has changed in this timeframe in regards these issues on a point-by-point basis?

What in the past 4 years can Entrust show to prove compliance in the following points:

  • We will not the make the decision not to revoke.
  • We will plan to revoke within the 24 hours or 5 days as applicable for the incident.
  • We will provide notice to our customers of our obligations to revoke and recommend action within 24 hours or 5 days based on the BR requirements.
  • We will recommend to our customers to implement automation of certificate management.
  • We will increase our ability for correct implementation and testing to ensure that certificate profiles will meet the latest CA/Browser Forum or root program requirements.
  • We will monitor the Mozilla incidents and the discussion list to discover problems which other CAs have experienced and how they were resolved. This will allow us to review and react if required to our own implementation. This will also help to minimize the number of miss-issued certificates, which will reduce the risk of late revocation.
  • We will manage and update our pre-issuance and post-issuance linting to discover or prevent the problem early.

Please explain, this is a learning opportunity.

We respectfully disagree with the opinion stated here that our commitments are to “non-changes” and regarding our belief about being held to a different standard. We believe that all CAs should be held to the CA/B Forum requirements in an equal manner. The existence of delayed revocation is not unique to Entrust and was even a topic of discussion at the most recent CA/B Forum in-person meeting in Italy.

You are entitled to respectfully disagree. I entirely agree that all CA should be held in equal manner, and you are entitled to comment on any other incident Entrust feels are not being checked thoroughly enough. Alternatively, if Entrust are uncomfortable about commenting publicly I am willing to look into incidents on any issues you believe require more public oversight. This is an open request to everyone, not just Entrust.

However, I would advise that Entrust shows how it has kept to the above commitments over the past 4 years. It is okay to make a mistake, but I think the real mistake throughout all of this is not understanding that being honest is not a fault. If Entrust truly wishes to improve they need to show a willingness to admit fault, and that needs to be a top-down culture change.

Because these new certificates are not directly linked to the old certificates, we would have no reliable mechanism to include them in the report as “new certificate issued, but old certificate not revoked yet.”

Fair enough. Please consider how a CA can improve its weekly updates on this front, preferably by revoking on time.

Flags: needinfo?(bruce.morton)

(In reply to Paul van Brouwershaven from comment #24)

Ciena Corporation

  • Industry: IT
  • Reason for delay: Inadequate automation/resource constraints: more time required to perform manual replacement
  • Progress: 1 of 1 certificate remaining
  • Estimated date of completion: May 17, 2024

Paul, could you please provide the crt.sh link for the certificate that was outstanding for Ciena Corporation at this date, and explain how it specifically was related to critical infrastructure? From my understanding of Ciena's products, and the likely use of that certificate, it's unclear what the damage would have been to the economy or society if it had been revoked promptly. I'd like to make sure I understand why Entrust felt that it was appropriate to delay revocation of this certificate for 11 times the maximum provided-for in the BRs.

(In reply to Wayne from comment #50)

(In reply to Bruce Morton from comment #49)

(In reply to Wayne from comment #35)

Hold on, are you stating that the 636 certificates remaining have not even been issued yet? At this moment in time there is no activity on fixing these other than waiting for your last two subscribers to perhaps deal with this within 48 hours after months of delays? Your per-subscriber breakdown also only adds up to 635...

We have re-checked, and the number of certificates that had been revoked or expired at the time of Comment 34 was 543, not 540. This left 633 unrevoked at the time, which corresponded to our per-subscriber breakdown of 633 (628 for Subscriber 1 and 5 for Subscriber 4).

As of May 30, 2024, all outstanding certificates in this bug were revoked.

I notice you used this statement in last week's report too, are you leaving the subscribers to handle requesting a certificate by their own means and not giving them one and telling them their current one will be revoked in x days?

Perhaps we did not understand the questions or underlying concern correctly. Has the concern been addressed given that all were revoked by the dates given in the Comment 34 progress update? If not, could you please clarify the question/concern?

This question was already addressed in Comment 41.

(In reply to Bruce Morton from comment #48)

Confidentiality agreements often rely on a party exercising its reasonable judgment to understand whether information disclosed by one party would reasonably be understood to be confidential by the receiving party. There are often exceptions where disclosure can be made to comply with specific external obligations such as cooperation with law enforcement. In this case, we are exercising our reasonable judgment about what information disclosed by our subscribers should be treated as confidential, and what may be shared with the community to comply with public trust obligations.

So to answer my question as posed:

Can Entrust even provide any boilerplate 'confidentiality agreement' that covers the scope of this request?

No. Furthermore no transparency can be provided to back up claims made in this incident. I presume based off of your interpretation of confidential information and reasonable judgment that no such specific stipulations actually exist, and this is a pro-active defense on Entrust's behalf?

Our MSAs with customers include obligations to maintain the confidentiality of proprietary, non-public information belonging to our subscribers. In this case, while we worked individually with subscribers and required a rationale for approving delayed revocation, what we did not do (and should have and will do in the future) is preface those conversations by letting these subscribers know that whatever rationale provided would be put into the public domain alongside their name.

As a result, a lot of the rationales provided included details about where in their infrastructure these certificates were and exactly how their systems would be affected. Providing this level of detail with subscriber name might have left their infrastructure vulnerable. As a result, we summarized the rationales and paired with the certificate serial number.

In the future, we intend to let our subscribers know that their wording will be made public ahead of time.

To further clarify Entrust stated the below in Comment 33:

Entrust defines disruption of the web ecosystem as involving critical infrastructure that is impactful to the economy or society. This would include effects on a vast subscriber customer-base, especially when there is significant impact to relying parties.

It might appear that customers of a subscriber can easily move from one service to another. The disruption is more significant as most of these subscribers are dealing with thousands or millions of consumers who have financial information located with a specific provider or have tickets with a certain airline, or have other financial transaction that are held with a subscriber. This potential disruption is further compounded when these subscribers have a large number of certificates.

Entrust does not want to make unilateral decisions to choose which subscribers get an extension and which subscribers do not. We are very much asking for assistance to seek a solution that will successfully address your concerns.

As already asked, can you state if any confidentiality agreements exists with your subscribers that cover providing the above information? Explicitly, or implicitly?

So far the statements provided are all hypothetical. Not only that any and all claims of not wanting to make unilateral decisions are irrelevant. In a related incident Entrust already attested to doing such a thing, see: #1886532:

See answer above.

  1. We decided not to revoke if we did not see revocation progress, and/or no confirmed contact (either an email reply or talking to someone directly). The intent was to avoid unintended harmful consequences. As expressed before, improving this process is a key learning and area of improvement we are committed to and welcome your feedback accordingly.

You may have intended this as a rhetorical question to emphasize the opinion that insufficient changes have been made, and we agree and acknowledge. We agree that insufficient changes were made over the last four years and are committed to making more extensive changes. Action items relating to this commitment are provided in Bug 1901270.

Please re-read my question:

What can you substantially show has changed in this timeframe in regards these issues on a point-by-point basis?

What in the past 4 years can Entrust show to prove compliance in the following points:

  • We will not the make the decision not to revoke.
  • We will plan to revoke within the 24 hours or 5 days as applicable for the incident.
  • We will provide notice to our customers of our obligations to revoke and recommend action within 24 hours or 5 days based on the BR requirements.
  • We will recommend to our customers to implement automation of certificate management.
  • We will increase our ability for correct implementation and testing to ensure that certificate profiles will meet the latest CA/Browser Forum or root program requirements.
  • We will monitor the Mozilla incidents and the discussion list to discover problems which other CAs have experienced and how they were resolved. This will allow us to review and react if required to our own implementation. This will also help to minimize the number of miss-issued certificates, which will reduce the risk of late revocation.
  • We will manage and update our pre-issuance and post-issuance linting to discover or prevent the problem early.

Following these recent incidents, we have done a thorough review and root cause analysis of the commitments we made in 2020 and these recent incidents, particularly around decisions to revoke and delayed revocation. In summary: We didn't have sufficient leadership awareness of these commitments, nor a clear enough process for evaluating and closely managing exception requests.

As a result, we have made leadership changes in the Entrust digital certificate business unit. We've also reorganized to leverage our global compliance resourcing, expertise, and governance more fully within this business unit. And we have clarified across all levels of the organization how seriously we take the requirements set by the CA/Browser Forum and the root programs and our intent to comply with them. In this we are guided by the TLS Baseline Requirements and Mozilla’s Responding to an Incident.

To embed these commitments with our subscribers, we will actively discuss with them our responsibilities as an issuer of public trust certificates – including revocation requirements. We will discuss use of private trust certificates for use cases where customers are having challenges revoking within 5 days, and automation for customers required to deploy publicly rooted certificates into payment ecosystems – where meeting standards doesn’t allow them to revoke within 5 days.

We held numerous webinars with our customers on the need and benefits of automation.

We made progress on commitments made in 2020 to introduce automation. These include:

  • Support for ACME v2, available for free to all our public SSL > customers;
  • Migrated all API users to our REST API, which offers more Certificate Lifecycle Management capabilities;
  • Maintaining Ansible plugins for common environments such as IIS, NginX, Apache, and F5;
  • Launched a connector to Microsoft Azure as well as ServiceNow
  • Launched an Entrust CLM solution.

Please explain, this is a learning opportunity.

We respectfully disagree with the opinion stated here that our commitments are to “non-changes” and regarding our belief about being held to a different standard. We believe that all CAs should be held to the CA/B Forum requirements in an equal manner. The existence of delayed revocation is not unique to Entrust and was even a topic of discussion at the most recent CA/B Forum in-person meeting in Italy.

You are entitled to respectfully disagree. I entirely agree that all CA should be held in equal manner, and you are entitled to comment on any other incident Entrust feels are not being checked thoroughly enough. Alternatively, if Entrust are uncomfortable about commenting publicly I am willing to look into incidents on any issues you believe require more public oversight. This is an open request to everyone, not just Entrust.

However, I would advise that Entrust shows how it has kept to the above commitments over the past 4 years. It is okay to make a mistake, but I think the real mistake throughout all of this is not understanding that being honest is not a fault. If Entrust truly wishes to improve they need to show a willingness to admit fault, and that needs to be a top-down culture change.

We believe we have provided honest answers that take responsibility for our actions. We have made it clear across many responses that while we made progress over the past four years, our responses to these recent incidents did not live up to our standards. Further, we hope that the list of improvement measures we provided in the June 7 report also demonstrate to the community that we believe there is ample room for improvement and that we are committed to making the needed improvements.

Because these new certificates are not directly linked to the old certificates, we would have no reliable mechanism to include them in the report as “new certificate issued, but old certificate not revoked yet.”

Fair enough. Please consider how a CA can improve its weekly updates on this front, preferably by revoking on time.

Flags: needinfo?(ngook.kong)

(In reply to ngook.kong from comment #52)

So to answer my question as posed:

Can Entrust even provide any boilerplate 'confidentiality agreement' that covers the scope of this request?

No. Furthermore no transparency can be provided to back up claims made in this incident. I presume based off of your interpretation of confidential information and reasonable judgment that no such specific stipulations actually exist, and this is a pro-active defense on Entrust's behalf?

Our MSAs with customers include obligations to maintain the confidentiality of proprietary, non-public information belonging to our subscribers. In this case, while we worked individually with subscribers and required a rationale for approving delayed revocation, what we did not do (and should have and will do in the future) is preface those conversations by letting these subscribers know that whatever rationale provided would be put into the public domain alongside their name.

As a result, a lot of the rationales provided included details about where in their infrastructure these certificates were and exactly how their systems would be affected. Providing this level of detail with subscriber name might have left their infrastructure vulnerable. As a result, we summarized the rationales and paired with the certificate serial number.

In the future, we intend to let our subscribers know that their wording will be made public ahead of time.

Thank you for articulating it is the MSA that is governing this confidentiality statement. In keeping with Entrust's prior understanding of Mozilla's 'Responding to an Incident' page, then Entrust should have already been more than aware that:

When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.

To that end, unless expressly stated within each MSA there is a duty bestowed onto Entrust to provide these rationale on a per-Subscriber basis. As a reminder my original question was:

Can you give an example of 3 subscribers who have detailed issues revoking in writing to this extent?

This is information that all parties should have been more than aware was a requirement in a delayed revocation event. I will note for context the extent is outlined within comment 33. If questions were answered when originally presented there would be no such need for this context.

As already asked, can you state if any confidentiality agreements exists with your subscribers that cover providing the above information? Explicitly, or implicitly?

So far the statements provided are all hypothetical. Not only that any and all claims of not wanting to make unilateral decisions are irrelevant. In a related incident Entrust already attested to doing such a thing, see: #1886532:

See answer above.

Given the above can Entrust state if any explicit confidentiality clause is holding them from providing rationale on a per-Subscriber basis?

What can you substantially show has changed in this timeframe in regards these issues on a point-by-point basis?
Following these recent incidents, we have done a thorough review and root cause analysis of the commitments we made in 2020 and these recent incidents, particularly around decisions to revoke and delayed revocation. In summary: We didn't have sufficient leadership awareness of these commitments, nor a clear enough process for evaluating and closely managing exception requests.

As a result, we have made leadership changes in the Entrust digital certificate business unit. We've also reorganized to leverage our global compliance resourcing, expertise, and governance more fully within this business unit. And we have clarified across all levels of the organization how seriously we take the requirements set by the CA/Browser Forum and the root programs and our intent to comply with them. In this we are guided by the TLS Baseline Requirements and Mozilla’s Responding to an Incident.

To embed these commitments with our subscribers, we will actively discuss with them our responsibilities as an issuer of public trust certificates – including revocation requirements. We will discuss use of private trust certificates for use cases where customers are having challenges revoking within 5 days, and automation for customers required to deploy publicly rooted certificates into payment ecosystems – where meeting standards doesn’t allow them to revoke within 5 days.

I will take this answer as a refusal to answer on a point-by-point basis, and note even ignoring that issue the answer does not sufficiently establish what has changed within the past 4 years. Given this relates to the MDSP report ultimately please include such root cause analysis on Entrust's failure to commit to prior commitments made in 2020 there. I would pay particular attention to how any such further commitments can be supported in comparison to prior commitments.

We held numerous webinars with our customers on the need and benefits of automation.

We made progress on commitments made in 2020 to introduce automation. These include:

  • Support for ACME v2, available for free to all our public SSL > customers;
  • Migrated all API users to our REST API, which offers more Certificate Lifecycle Management capabilities;
  • Maintaining Ansible plugins for common environments such as IIS, NginX, Apache, and F5;
  • Launched a connector to Microsoft Azure as well as ServiceNow
  • Launched an Entrust CLM solution.

As this copy and pasted answer has already came up I will not further address it's faults in this incident.

We believe we have provided honest answers that take responsibility for our actions. We have made it clear across many responses that while we made progress over the past four years, our responses to these recent incidents did not live up to our standards. Further, we hope that the list of improvement measures we provided in the June 7 report also demonstrate to the community that we believe there is ample room for improvement and that we are committed to making the needed improvements.

The issue, ultimately, is that the RCA of the issues in the past 3 months show that no such improvement has occurred across 4 years. If Entrust feels that is not the case, then it should have addressed this systemic flaw within their own report. I believe there is sufficient feedback on MDSP to Entrust's report to establish what the general sentiment has been to date. To that end, I hope that Entrust truly has introspection and establishes what went wrong on so many levels that is missing from their key report.

Flags: needinfo?(ngook.kong)

(In reply to Mike Shaver (:shaver emeritus) from comment #51)

(In reply to Paul van Brouwershaven from comment #24)

Ciena Corporation

  • Industry: IT
  • Reason for delay: Inadequate automation/resource constraints: more time required to perform manual replacement
  • Progress: 1 of 1 certificate remaining
  • Estimated date of completion: May 17, 2024

Paul, could you please provide the crt.sh link for the certificate that was outstanding for Ciena Corporation at this date, and explain how it specifically was related to critical infrastructure? From my understanding of Ciena's products, and the likely use of that certificate, it's unclear what the damage would have been to the economy or society if it had been revoked promptly. I'd like to make sure I understand why Entrust felt that it was appropriate to delay revocation of this certificate for 11 times the maximum provided-for in the BRs.

The crt.sh link is https://crt.sh/?sha256=025FDDC6D86C2EF95236FEACFF7EA2243AC2EB6FEE92F5F430F58551543D50BB

We admit that due to process gaps and lapse in judgment of the responsible team leadership we did not record sufficient information to explain or prove how this certificate specifically was related to critical infrastructure.

As we have noted several times, the level of delayed revocation in this and other incidents does not live up to our standards, nor what is expected in the Baseline Requirements and root program requirements and we take responsibility for that and are committed to making the process/leadership changes needed and also publicly track the progress on these items.

Flags: needinfo?(ngook.kong)

(In reply to Wayne from comment #53)

(In reply to ngook.kong from comment #52)

So to answer my question as posed:

Can Entrust even provide any boilerplate 'confidentiality agreement' that covers the scope of this request?

No. Furthermore no transparency can be provided to back up claims made in this incident. I presume based off of your interpretation of confidential information and reasonable judgment that no such specific stipulations actually exist, and this is a pro-active defense on Entrust's behalf?

Our MSAs with customers include obligations to maintain the confidentiality of proprietary, non-public information belonging to our subscribers. In this case, while we worked individually with subscribers and required a rationale for approving delayed revocation, what we did not do (and should have and will do in the future) is preface those conversations by letting these subscribers know that whatever rationale provided would be put into the public domain alongside their name.

As a result, a lot of the rationales provided included details about where in their infrastructure these certificates were and exactly how their systems would be affected. Providing this level of detail with subscriber name might have left their infrastructure vulnerable. As a result, we summarized the rationales and paired with the certificate serial number.

In the future, we intend to let our subscribers know that their wording will be made public ahead of time.

Thank you for articulating it is the MSA that is governing this confidentiality statement. In keeping with Entrust's prior understanding of Mozilla's 'Responding to an Incident' page, then Entrust should have already been more than aware that:

When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.

To that end, unless expressly stated within each MSA there is a duty bestowed onto Entrust to provide these rationale on a per-Subscriber basis. As a reminder my original question was:

Can you give an example of 3 subscribers who have detailed issues revoking in writing to this extent?

This is information that all parties should have been more than aware was a requirement in a delayed revocation event. I will note for context the extent is outlined within comment 33. If questions were answered when originally presented there would be no such need for this context.

As already asked, can you state if any confidentiality agreements exists with your subscribers that cover providing the above information? Explicitly, or implicitly?

So far the statements provided are all hypothetical. Not only that any and all claims of not wanting to make unilateral decisions are irrelevant. In a related incident Entrust already attested to doing such a thing, see: #1886532:

See answer above.

Given the above can Entrust state if any explicit confidentiality clause is holding them from providing rationale on a per-Subscriber basis?

Yes, we believe the information we have already shared, and the information not shared, represents what we can do in the current situation under the confidentiality clauses in our MSAs (Subscriber Agreements). We have also acknowledged that we did not sufficiently notify our subscribers about the need to disclose the rationales they provided on a per-Subscriber basis. We believe it would have been necessary to provide explicit notice of this, because the Subscribers do not have the level of awareness and familiarity with revocation obligations and processes that the members of this community have. We have agreed that it is our responsibility both to educate subscribers about industry requirements, and to provide explicit notices regarding the public nature of rationales for delayed revocation, and we have committed to do this in the future.

What can you substantially show has changed in this timeframe in regards these issues on a point-by-point basis?
Following these recent incidents, we have done a thorough review and root cause analysis of the commitments we made in 2020 and these recent incidents, particularly around decisions to revoke and delayed revocation. In summary: We didn't have sufficient leadership awareness of these commitments, nor a clear enough process for evaluating and closely managing exception requests.

As a result, we have made leadership changes in the Entrust digital certificate business unit. We've also reorganized to leverage our global compliance resourcing, expertise, and governance more fully within this business unit. And we have clarified across all levels of the organization how seriously we take the requirements set by the CA/Browser Forum and the root programs and our intent to comply with them. In this we are guided by the TLS Baseline Requirements and Mozilla’s Responding to an Incident.

To embed these commitments with our subscribers, we will actively discuss with them our responsibilities as an issuer of public trust certificates – including revocation requirements. We will discuss use of private trust certificates for use cases where customers are having challenges revoking within 5 days, and automation for customers required to deploy publicly rooted certificates into payment ecosystems – where meeting standards doesn’t allow them to revoke within 5 days.

I will take this answer as a refusal to answer on a point-by-point basis, and note even ignoring that issue the answer does not sufficiently establish what has changed within the past 4 years. Given this relates to the MDSP report ultimately please include such root cause analysis on Entrust's failure to commit to prior commitments made in 2020 there. I would pay particular attention to how any such further commitments can be supported in comparison to prior commitments.

We held numerous webinars with our customers on the need and benefits of automation.

We made progress on commitments made in 2020 to introduce automation. These include:

  • Support for ACME v2, available for free to all our public SSL > customers;
  • Migrated all API users to our REST API, which offers more Certificate Lifecycle Management capabilities;
  • Maintaining Ansible plugins for common environments such as IIS, NginX, Apache, and F5;
  • Launched a connector to Microsoft Azure as well as ServiceNow
  • Launched an Entrust CLM solution.

As this copy and pasted answer has already came up I will not further address it's faults in this incident.

We believe we have provided honest answers that take responsibility for our actions. We have made it clear across many responses that while we made progress over the past four years, our responses to these recent incidents did not live up to our standards. Further, we hope that the list of improvement measures we provided in the June 7 report also demonstrate to the community that we believe there is ample room for improvement and that we are committed to making the needed improvements.

The issue, ultimately, is that the RCA of the issues in the past 3 months show that no such improvement has occurred across 4 years. If Entrust feels that is not the case, then it should have addressed this systemic flaw within their own report. I believe there is sufficient feedback on MDSP to Entrust's report to establish what the general sentiment has been to date. To that end, I hope that Entrust truly has introspection and establishes what went wrong on so many levels that is missing from their key report.

We understand your assessment and will be sharing updates to our report.

All certificates have been revoked and there are no open actions items. We request this bug be closed.

Flags: needinfo?(bruce.morton)

I will close this on or about 5-July-2024, unless there are issues or concerns that haven't been addressed.

Flags: needinfo?(bwilson)
You need to log in before you can comment on or make changes to this bug.