Open Bug 1898848 Opened 1 month ago Updated 3 days ago

Entrust: Delayed revocation of certificates affected by Jurisdiction issue in some EV TLS & Code Signing certificates

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: ngook.kong, Assigned: ngook.kong)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

Incident Report

Summary

On May 18, 2024, we created bug 1897630 to report an incident with the jurisdiction data of some EV TLS & Code Signing certificates. The affected certificates were revoked or had expired by May 21, 2024, but this was more than 5 days from when Entrust should have been aware of the incident, and we are therefore submitting this report regarding the causes for this delay.

Our analysis shows that due to insufficient processes and resources we failed to investigate, escalate, confirm and report the incident, and consequently to revoke the affected certificates, quickly enough to meet the timeframes.

Impact

This delayed reporting bug impacts the same certificates affected by bug 1897630. All affected certificates had expired or were revoked by May 21, 2024, 9:30 AM UTC.

Timeline

All times are UTC.

2021-03-03:

  • Bug 1696227 (Mar 2021): Incorrect Jurisdiction Country Value in an EV Certificate
    • This incident is about certificates where the jurisdiction country is set to the value of “ZA” when it should have been set to “BW”. While this is related to incorrect information in the jurisdiction data of the certificate, this does not seem to be directly related to this incident.

2022-11-28:

  • Bug 1802916 (Nov 2022): EV TLS Certificate incorrect jurisdiction
    • In this incident report we identified that the jurisdiction state or province was used when the registry was from the country level. We did not identify cases where the jurisdiction state or province was missing and/or the registry was also from the country level.

2023-03-14:

  • Dropdown functionality was implemented for Private Organizations as described in bug 1802916 comment 7.

2023-11-28:

  • Bug 1867130 (Nov 2023): Jurisdiction Locality Wrong in EV Certificate
    • This incident detected a postal code in the jurisdiction locality field for a government entity (which does not come from the drop-down list) and was caused by insufficient indication of changes.

2024-03-09:

  • An ad hoc scan with pkilint was run in relation to bug 1883843, as part of the preparation work to implement pkilint as a post-issuance linter. The report shared by the engineer was seen as confirming the known incident (bug 1883843) and was not reviewed or investigated further by the compliance team because it was just a list of errors without identifying the certificates. The team requested a report that would include certificate numbers. Unbeknownst to the compliance team, among the thousands of errors in the report, there were 42 errors relating to the locality error. The engineer working on this fix left the company and this task halted, and the escalation process was not followed.

2024-04-03:

  • 13:12 A new scan with pkilint was started; initial results (while the scan was still running) highlighted an error where the jurisdiction locality was present and the state or province was missing. The issue was escalated to our verification team for further investigation.
  • 19:50 Verification data indicated that the organization profiles had been validated at the country level and that the locality was not listed for these jurisdictions.

2024-04-04:

  • 12:42 A compliance team member reviewed the issue and determined that the locality information in the certificates was likely incorrect. The issue was discussed with the verification team who started a deeper investigation into why their data was not the same as in the certificates.
  • 15:00 The issue was discussed between compliance and the verification team resulting in the need for further investigation.

2024-04-08:

  • We detected a certificate that was issued with a government entity with the same issue, however the logic for government entities is different from private organizations. Where private organizations leverage our pre-verified jurisdiction list located at https://www.entrust.com/legal-compliance/approved-incorporating-agencies, government organizations are manual.

2024-04-11:

  • 10:05 pkilint is added as a post issuance linter in production.

2024-04-15:

  • 11:48 The problem was included in a report from a partial scan with pkilint during the implementation of the post-issuance linter. The results of this scan were included in the communication to the product compliance team manager and should have been escalated at that time through established processes. It was not escalated because the compliance team manager incorrectly assumed the data was reporting the cPSuri problem from bug 1883843 that was already being addressed.

2024-05-12:

  • 02:19 A member of the product compliance team (unprompted) identified that this issue had not been reported and actioned. Following process, senior leadership was informed, and our incident handling procedure initiated.

2024-05-13:

  • 13:00 The product compliance team manager formally started an investigation.

2024-05-16

  • 11:55 Mis-issuance confirmed and final certificate data verified.
  • 11:55 We started the 5-day revocation clock.
  • 16:00 Notified subscribers of the impacted certificates and that they would be revoked within 5 days.

2024-05-21

  • 09:30 All remaining impacted certificates were revoked.

Root Cause Analysis

1. Why was there a problem?

According to s. 4.9.1 of the Baseline Requirements, a certificate must be revoked within 5 days of a CA becoming aware that it was not issued in accordance with the applicable EV Guidelines requirements. However, Entrust failed to investigate and escalate data that would have identified and confirmed the mis-issuance, and triggered revocation of the affected certificates, in a timely manner.

2. Why did the initial scan on 2024-03-09 not trigger further investigation?

The 2024-03-09 scan that initially generated the relevant data was run by an engineer as part of an experimental quick scan to learn more about the PKIlinter; the work was not being tracked as a production project. The report generated by the scan only included the linting output without identifying the certificates that generated these errors. The compliance team only reviewed the report sufficiently to identify deficiencies in the report itself; based on this high-level review the team believed the report only confirmed the known incident (bug 1883843). The 42 errors with the locality error were buried within a list of thousands of cPSuri errors and were not noticed in the high-level review.

3. Why didn’t the results of the 2024-04-03 or 2024-04-15 pkilint scans trigger further investigation?

The pkilint scan on 2024-04-03 produced preliminary results that highlighted an error that triggered further review and led to the finding by a compliance team member that information in certificates was likely incorrect and warranted further investigation, but the email reporting this and recommending further investigation was missed due to the volume of emails that had been received during this time.

A report from a third scan on 2024-04-15 also included the problem; this report was communicated by email to the product compliance team manager for investigation/escalation. It was not actioned because the manager assumed the report related to the cPSuri problem from bug 1883843 that was already being addressed.

4. Why were the existing processes and resources not sufficient to trigger investigation, escalation, and confirmation of an incident, and revocation of the affected certificates, within an acceptable timeframe?

At relevant points in the timeline for this bug, the compliance team was responding to other certificate revocation-event incidents that had impacts on a large numbers of subscribers and generated a large volume of questions from the community demanding the team’s time, energy and attention.

Existing processes rely on communication, tracking, and reporting of potential incidents and management of revocation events via email, team discussions, and other processes that are susceptible to human errors, especially in situations such as the one described above.

In addition, the authority to launch and conduct formal investigations, confirm incidents, initiate incident reporting processes, and trigger revocation events is held by a small number of individuals within the compliance team. These same individuals were responsible for helping to respond to incidents, communicating with impacted subscribers, responding to questions from the Bugzilla community, and drafting and submitting incident reports, in addition to other day-to-day responsibilities.

Lessons Learned

  • We need to improve investigation and escalation processes for potential incidents and provide additional support to teams responsible for certificate compliance and verification.

What went well

  • Once senior leadership was notified of the event, the mis-issuance was confirmed and the impacted certificates were revoked within 5 days as required.

What didn't go well

  • Human errors resulting in data not being investigated, escalated or actioned in a timely way.

Where we got lucky

  • A product compliance team member eventually realized (unprompted) that the issue had not been reported or actioned and notified senior leadership.

Action Items

We have identified the following actions items and will consider this issue during the reflection on our recent incidents of which we will be publishing our report on or before June 7 to Mozilla and the community.

Action Item Kind Due Date
Review applicable policy and procedures Prevent June 28, 2024
Improve our internal problem reporting mechanism reported by internal staff Detect July 31, 2024
Reorganize product compliance and verification teams to provide additional organizational resources and oversight Prevent July 31, 2024
Implement additional input validation controls for verification Mitigate July 26, 2024
Implement pkilint as post-issuance linter Detect Done

Appendix

Details of affected certificates

Attached is the list of impacted certificates.

So is Entrust still not creating action items that help prevent delayed revocations in the future?

What we need from this report is explanations of why the revocations did not occur within the required timeframe and action items to show it will never happen again. Ideally this would also include prior commitments and what went wrong in those promises too.

4. Why were the existing processes and resources not sufficient to trigger investigation, escalation, and confirmation of an incident, and revocation of the affected certificates, within an acceptable timeframe?

At relevant points in the timeline for this bug, the compliance team was responding to other certificate revocation-event incidents that had impacts on a large numbers of subscribers and generated a large volume of questions from the community demanding the team’s time, energy and attention.

Existing processes rely on communication, tracking, and reporting of potential incidents and management of revocation events via email, team discussions, and other processes that are susceptible to human errors, especially in situations such as the one described above.

In addition, the authority to launch and conduct formal investigations, confirm incidents, initiate incident reporting processes, and trigger revocation events is held by a small number of individuals within the compliance team. These same individuals were responsible for helping to respond to incidents, communicating with impacted subscribers, responding to questions from the Bugzilla community, and drafting and submitting incident reports, in addition to other day-to-day responsibilities.

You're not really answering the question. The first paragraph spends time talking about what the CA is required to do during any incident. The second is mentioning that humans are working in your company. The third highlights an authority restriction for handling incidents, but again misunderstands that the aspects detailed are the basic requirements for operating as a CA. Why were the existing processes and resources not sufficient?

I will note we have been told in the other incidents that an incident handling lifecycle process tool is in the works of being implemented. Why is this not mentioned anywhere in the report? The action items in this report should be about the delayed revocation, not the initial issue.

As required can you provide a per-subscriber breakdown of each impacted certificate?

Flags: needinfo?(ngook.kong)
Attached file Jurisdiction 2024-05-24 CT.csv (obsolete) —
Flags: needinfo?(ngook.kong)

(In reply to Wayne from comment #4)

What we need from this report is explanations of why the revocations did not occur within the required timeframe and action items to show it will never happen again. Ideally this would also include prior commitments and what went wrong in those promises too.

  1. Why were the existing processes and resources not sufficient to trigger investigation, escalation, and confirmation of an incident, and revocation of the affected certificates, within an acceptable timeframe?
    At relevant points in the timeline for this bug, the compliance team was responding to other certificate revocation-event incidents that had impacts on a large numbers of subscribers and generated a large volume of questions from the community demanding the team’s time, energy and attention.
    Existing processes rely on communication, tracking, and reporting of potential incidents and management of revocation events via email, team discussions, and other processes that are susceptible to human errors, especially in situations such as the one described above.
    In addition, the authority to launch and conduct formal investigations, confirm incidents, initiate incident reporting processes, and trigger revocation events is held by a small number of individuals within the compliance team. These same individuals were responsible for helping to respond to incidents, communicating with impacted subscribers, responding to questions from the Bugzilla community, and drafting and submitting incident reports, in addition to other day-to-day responsibilities.

You're not really answering the question. The first paragraph spends time talking about what the CA is required to do during any incident. The second is mentioning that humans are working in your company. The third highlights an authority restriction for handling incidents, but again misunderstands that the aspects detailed are the basic requirements for operating as a CA. Why were the existing processes and resources not sufficient?

Please allow us to clarify our response: the existing processes were insufficient because they were overly vulnerable to human error, contained bottle-necks, lacked automation, and did not include mechanisms to facilitate scaling up incident response efforts to handle surge in volume and complexity of work required to respond to all of the different incidents.

I will note we have been told in the other incidents that an incident handling lifecycle process tool is in the works of being implemented. Why is this not mentioned anywhere in the report? The action items in this report should be about the delayed revocation, not the initial issue.

The action items in this report include “improve our internal problem reporting mechanism reported by internal staff”. The proposed incident handling lifecycle process tool may be one of these improvements, but we intend to consider other possible improvements as well.

As required can you provide a per-subscriber breakdown of each impacted certificate?

See Attachment#9404720 [details]

Attachment #9404720 - Attachment is obsolete: true

As this is a delayed revocation incident you will need to provide a per-subscriber breakdown of what stopped Entrust from revoking within 5 days. I would emphasis that given the quantity of delayed revocation incidents by Entrust lately that a subscriber using an excuse more than once is a problem.

A delayed revocation event is not supposed to repeat, especially for the same subscribers. It is the Certificate Authority's duty to revoke within the required deadlines. Please show that Entrust is capable of operating to this bare minimum.

Flags: needinfo?(ngook.kong)

(In reply to Amir from comment #2)

So is Entrust still not creating action items that help prevent delayed revocations in the future?

Hi, Amir. There are several actions we have taken or will be taking in the near term to prevent delayed revocations in the future.

1. Prevention:
a. We will be expanding our use of linters pre- and post-issuance for all certificate types as well as expanding our use of linters for other cryptographic or compliance objects.
b. We will also be implementing a robust cross-functional change control process to cover full lifecycle planning and execution of changes for public trust certificates with layers of redundancy and approval.

2. Formal revocation handling process:
a. We will further formalize the process for ensuring that any delayed revocation requests meet established criteria for approval, are granted on a very limited basis with a presumption of denial and are communicated to the community with the required level of detail.

3. Subscriber Education & Advisory:
a. We will be launching new, targeted communication and education around the requirements for public trust certificates, alternatives available to Subscribers (e.g., use of private certificates) and necessity of automation to enable rapid revocation and replacement of affected certificates.

We also look forward to further discussion through the CA/B Forum at the appropriate time on the current revocation timelines and related requirements to balance the needs of the web ecosystem against the realities for Subscribers. More details on these action items, including timelines for completion, will be supplied in our forthcoming report to Mozilla and the Bugzilla community.

Flags: needinfo?(ngook.kong)
Assignee: nobody → ngook.kong
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [leaf-revocation-delay]

Ngook:
From 2a. is it right that you are saying that you will continue to intentionally support your subscribers in delaying revocations, despite it being a clear part of the guidelines not to?
With all the Entrust incidents and those from other CAs, how can you make this an 'action point'?
Is the community made believe you do not violate other parts of the guidelines that makes things 'easier' for your customers?
Why can you not commit to just following the rules that are set?

I think I’ve said this a dozen times so far. Entrust, you can not “prevent” delayed revocation by trying to prevent incidents that lead to a revocation requirement. I can promise you that incidents necessitating revocation and mass revocation will happen. So the prevention step in #1 will not prevent delayed revocations. The only thing that will prevent delayed revocations is doing the revocations on time.

As for #2, at this point I think the community needs to know the exact criteria you’ll be using for delaying revocation. You’ve made a promise in the past that you won’t delay revocation ever again, and now you’re going against that promise by making another promise that you’ll have specific criteria.

I formally ask Mozilla to not consider this bug resolved until Entrust explicitly details the decision matrix and flowcharts they will be using to make that determination.

I also would like to ask that entrust be required to show how each subscriber in their current open delayed revocation incidents would fall in that decision matrix. Specifically I’d like to see this question answered: are there any subscribers where a delayed revocation was granted before but wouldn’t in the future under this framework?

(In reply to ngook.kong from comment #8)

Hi, Amir. There are several actions we have taken or will be taking in the near term to prevent delayed revocations in the future.

1. Prevention:
a. We will be expanding our use of linters pre- and post-issuance for all certificate types as well as expanding our use of linters for other cryptographic or compliance objects.
b. We will also be implementing a robust cross-functional change control process to cover full lifecycle planning and execution of changes for public trust certificates with layers of redundancy and approval.

2. Formal revocation handling process:
a. We will further formalize the process for ensuring that any delayed revocation requests meet established criteria for approval, are granted on a very limited basis with a presumption of denial and are communicated to the community with the required level of detail.

3. Subscriber Education & Advisory:
a. We will be launching new, targeted communication and education around the requirements for public trust certificates, alternatives available to Subscribers (e.g., use of private certificates) and necessity of automation to enable rapid revocation and replacement of affected certificates.

We also look forward to further discussion through the CA/B Forum at the appropriate time on the current revocation timelines and related requirements to balance the needs of the web ecosystem against the realities for Subscribers. More details on these action items, including timelines for completion, will be supplied in our forthcoming report to Mozilla and the Bugzilla community.

Ngook,
You have joined others from Entrust in simply ignoring the extensive commentary on your open bugs from the community about your CA’s willful ignorance of revocation requirements. It is painful to watch Entrust continually pretending it hasn’t read the comments that do not support its narrative.

So I will make this comment abundantly clear, and I will include questions to make it unambiguous that an on point reply is expected from Entrust within seven days.

This is one of five delayed revocation bugs Entrust has open right now. So far as I can see, Entrust has neglected in four of these five bugs to provide action items that deal with the failure to revoke on time. I will remind you once again of what Entrust repeatedly has been reminded of on this forum, which is that action items addressing the original cause of misissuance are not action items addressing the failure to revoke. Rather, in a delayed revocation bug, the community expects the CA to provide action items that address the specific causes of this specific failure, that look not just at surface level causes but also at true root causes, and that can reasonably be expected to prevent this error in the future. Repeated failures of the same nature will be looked upon unfavorably, as they reveal a CA that is failing to learn from its and the community’s mistakes and continually improve.

Entrust created the first of its five open bugs regarding failure to revoke correctly on March 20, 2024 for its refusal to meet BR requirements for an incident that it knew about on March 4. That report did not contain action items that could reasonably be expected to prevent repetition of this problem, and 41 comments later the thread still does not. Four more failed revocation bugs followed since then, and only the last of them, this one, contains any action items that address the root cause of the failure to revoke. As this bug has a different root cause than the other four, the analysis and action items in comment 1 do not contribute to resolution of the repeated problem exemplified in the other four bugs.

I will further remind the community of what you also are well aware of, which is that the Subscriber’s inability to manage basic certificate operations with resilience and agility is no excuse for the CA’s failure to follow required revocation behaviors detailed in the Baseline Requirements.

With all this background, it is noteworthy that Entrust continues not to provide clear, concrete action items to solve this ongoing deficiency. Your points 1.a and 1.b above relate to original misissuance and not the failure to revoke. Your point 2.a begins with the premise that you will be conducting late revocations again in the future, which is antithetical to the point of this entire exercise. Your point 3.a is about encouraging Subscriber resiliency and as such does not address Entrust’s failure in this and its other late revocation incidents. And Entrust is yet to provide appropriate action items for your four other, open, failed revocation incidents.

One possible conclusion is that Entrust has no intention ever to obey mandated revocation timelines , and therefore that Entrust will continue simply to refuse to commit to appropriate action items . I hope that is not the case.

If that is not the case, then Entrust should list a set of genuine, concrete, measurable action items that convincingly stand to rectify the CA’s recidivism and ensure no new delayed revocation bugs ever. I will remind you once again that such action items are not actions to prevent or minimize misissuance and are not to address Subscribers’ unwillingness or inability to swap out certificates in time spans shorter than 120 hours. The action items I describe here will be changes Entrust will make and be held accountable for that will ensure, regardless of any Subscriber situation or behavior whatsoever, that any misissued certificate at any time in the future will be revoked within 120 or 24 hours, in any volume up to and including Entrust’s entire corpus of active certificates.

So now, with all that preamble, here are my questions.

Question 1: Does Entrust intend in the future to obey mandated revocation timelines? Please provide a definitive and clear answer, avoiding opaque noncommittal language.

Question 2: If the answer to question 1 is no, why not?

Question 3: Does Entrust intend to provide relevant action items for bug 1887705, bug 1886532, bug 1890685, and bug 1890898?

Question 4: If the answer to question 3 is no, why not?

Question 5: If the answer to question 3 is yes, when will Entrust commit to providing those action items? I will remind you before you answer this question that appropriate action items should have been available nearly two months ago.

Question 6: Does Entrust understand the criteria for what is an appropriate action item for a failed revocation bug and what is not? If the answer is yes, when did Entrust first come to this understanding?

Question 7: What is the reason that Entrust has failed for such a long time to provide appropriate action items for the four bugs listed in question 3? Please be specific and detailed in your response.

Question 8: What is the reason that, after receiving repeated comments on multiple incidents clarifying what is and is not appropriate as an action item for a failed revocation bug, Entrust has as late as June 1 continued to offer inappropriate action items while failing to offer appropriate action items for its failed revocation bugs? Please be specific and detailed in your response.

Question 9: Considering that in the cases of bug 1890685 and bug 1890898 large numbers of certificates remain unrevoked, does Entrust intend to apply any action items immediately to conclude these failed revocation events?

(In reply to Wayne from comment #7)

As this is a delayed revocation incident you will need to provide a per-subscriber breakdown of what stopped Entrust from revoking within 5 days. I would emphasis that given the quantity of delayed revocation incidents by Entrust lately that a subscriber using an excuse more than once is a problem.

A delayed revocation event is not supposed to repeat, especially for the same subscribers. It is the Certificate Authority's duty to revoke within the required deadlines. Please show that Entrust is capable of operating to this bare minimum.

No subscribers requested delayed revocation in this incident. We revoked and re-issued all affected certificates within five days after the mis-issuance was confirmed. Our investigation found that we should have actioned this issue earlier, so we filed this incident based on the time when the incident should have been confirmed.

(In reply to ngook.kong from comment #12)

(In reply to Wayne from comment #7)

As this is a delayed revocation incident you will need to provide a per-subscriber breakdown of what stopped Entrust from revoking within 5 days. I would emphasis that given the quantity of delayed revocation incidents by Entrust lately that a subscriber using an excuse more than once is a problem.

A delayed revocation event is not supposed to repeat, especially for the same subscribers. It is the Certificate Authority's duty to revoke within the required deadlines. Please show that Entrust is capable of operating to this bare minimum.

No subscribers requested delayed revocation in this incident. We revoked and re-issued all affected certificates within five days after the mis-issuance was confirmed. Our investigation found that we should have actioned this issue earlier, so we filed this incident based on the time when the incident should have been confirmed.

Then what is this incident even about? As part of a delayed revocation incident a list of certificates that were not revoked by the time period should be provided as per-subscriber reasons included. Entrust need to understand that if they are creating incident reports then the reports must be consistent with the policies requiring them.

106 certificates are listed in this incident, as this is the delayed revocation incident Entrust are telling us that all of these are the impacted certificates. Talk me through how this incident is both handled within five days after the mis-issuance was confirmed, but also a delayed revocation incident?

Flags: needinfo?(ngook.kong)

(In reply to Wayne from comment #7)

As this is a delayed revocation incident you will need to provide a per-subscriber breakdown of what stopped Entrust from revoking within 5 days. I would emphasis that given the quantity of delayed revocation incidents by Entrust lately that a subscriber using an excuse more than once is a problem.

A delayed revocation event is not supposed to repeat, especially for the same subscribers. It is the Certificate Authority's duty to revoke within the required deadlines. Please show that Entrust is capable of operating to this bare minimum.

No subscribers requested delayed revocation in this incident. We revoked and re-issued all affected certificates within five days after the mis-issuance was confirmed. Our investigation found that we should have actioned this issue earlier, so we filed this incident based on the time when the incident should have been confirmed.

(In reply to JR Moir from comment #9)

Ngook:
From 2a. is it right that you are saying that you will continue to intentionally support your subscribers in delaying revocations, despite it being a clear part of the guidelines not to?
With all the Entrust incidents and those from other CAs, how can you make this an 'action point'?
Is the community made believe you do not violate other parts of the guidelines that makes things 'easier' for your customers?
Why can you not commit to just following the rules that are set?

Thanks, JR. While we intend to strengthen our processes around review and approval of delayed revocation requests from subscribers in the future, we are not committing to never having a delayed revocation. The Mozilla Root Store Policy recognizes exceptional circumstances may arise that warrant delayed revocation and it is the practice of many other CAs who are part of this community to allow delayed revocation requests. We will commit to granting these requests on a very limited basis and following a strict, pre-defined process.

(In reply to amir from comment #10)

I think I’ve said this a dozen times so far. Entrust, you can not “prevent” delayed revocation by trying to prevent incidents that lead to a revocation requirement. I can promise you that incidents necessitating revocation and mass revocation will happen. So the prevention step in #1 will not prevent delayed revocations. The only thing that will prevent delayed revocations is doing the revocations on time.

As for #2, at this point I think the community needs to know the exact criteria you’ll be using for delaying revocation. You’ve made a promise in the past that you won’t delay revocation ever again, and now you’re going against that promise by making another promise that you’ll have specific criteria.

I formally ask Mozilla to not consider this bug resolved until Entrust explicitly details the decision matrix and flowcharts they will be using to make that determination.

I also would like to ask that entrust be required to show how each subscriber in their current open delayed revocation incidents would fall in that decision matrix. Specifically I’d like to see this question answered: are there any subscribers where a delayed revocation was granted before but wouldn’t in the future under this framework?

Thank you, Amir. In the forthcoming report that will be published for Mozilla and the community on June 7, we will commit to developing a strengthened process around review and approval of delayed revocation requests in the future.

(In reply to Wayne from comment #13)

(In reply to ngook.kong from comment #12)

(In reply to Wayne from comment #7)

As this is a delayed revocation incident you will need to provide a per-subscriber breakdown of what stopped Entrust from revoking within 5 days. I would emphasis that given the quantity of delayed revocation incidents by Entrust lately that a subscriber using an excuse more than once is a problem.

A delayed revocation event is not supposed to repeat, especially for the same subscribers. It is the Certificate Authority's duty to revoke within the required deadlines. Please show that Entrust is capable of operating to this bare minimum.

No subscribers requested delayed revocation in this incident. We revoked and re-issued all affected certificates within five days after the mis-issuance was confirmed. Our investigation found that we should have actioned this issue earlier, so we filed this incident based on the time when the incident should have been confirmed.

Then what is this incident even about? As part of a delayed revocation incident a list of certificates that were not revoked by the time period should be provided as per-subscriber reasons included. Entrust need to understand that if they are creating incident reports then the reports must be consistent with the policies requiring them.

106 certificates are listed in this incident, as this is the delayed revocation incident Entrust are telling us that all of these are the impacted certificates. Talk me through how this incident is both handled within five days after the mis-issuance was confirmed, but also a delayed revocation incident?

Once we confirmed the incident, we revoked all affected certificates within five days as required. However, we should have detected and escalated the issue for investigation and confirmation prior to the date on which it was escalated. As a result, we filed a delayed revocation report. Once subscribers were notified of the issue, however, all affected certificates were revoked within the five-day timeline. We hope this further explanation helps to clear any confusion you may have around the filing of the report.

(In reply to Bruce Morton from comment #17)

Once we confirmed the incident, we revoked all affected certificates within five days as required. However, we should have detected and escalated the issue for investigation and confirmation prior to the date on which it was escalated. As a result, we filed a delayed revocation report. Once subscribers were notified of the issue, however, all affected certificates were revoked within the five-day timeline. We hope this further explanation helps to clear any confusion you may have around the filing of the report.

Frankly it doesn't. If no 'subscribers' requested delayed revocation, and no certificates have been identified as outwith the 5-day window by Entrust then what is this incident covering? Entrust has raised an incident then claimed nothing occurred, so what is the purpose of this incident?

The reason I asked for a per-subscriber breakdown is because as part of this being a delayed revocation event there needs to have been any certificates that went past the threshold of 5-days by the CA's definition. What information do we have to work with on that front?

Notably Entrust have overlooked answering Comment 11 in this incident.

Entrust: There are no replies to the questions from Tim Callan in https://bugzilla.mozilla.org/show_bug.cgi?id=1898848#c11

Why are you still unable to keep basic time-lines and answer questions within required 7 days?

(That is also a question for you to explain why you can't answer in 7 day, and also answer Hr. Callan all questions.)

Can you please explain the process you use to keep track of your time requirements for answering these questions?

I’m really curious if there is any formal method of tracking these because based off of how often you’ve missed the deadline on them it really seems like there’s no actual process.

(In reply to Tim Callan from comment #11)

(In reply to ngook.kong from comment #8)

Hi, Amir. There are several actions we have taken or will be taking in the near term to prevent delayed revocations in the future.

1. Prevention:
a. We will be expanding our use of linters pre- and post-issuance for all certificate types as well as expanding our use of linters for other cryptographic or compliance objects.
b. We will also be implementing a robust cross-functional change control process to cover full lifecycle planning and execution of changes for public trust certificates with layers of redundancy and approval.

2. Formal revocation handling process:
a. We will further formalize the process for ensuring that any delayed revocation requests meet established criteria for approval, are granted on a very limited basis with a presumption of denial and are communicated to the community with the required level of detail.

3. Subscriber Education & Advisory:
a. We will be launching new, targeted communication and education around the requirements for public trust certificates, alternatives available to Subscribers (e.g., use of private certificates) and necessity of automation to enable rapid revocation and replacement of affected certificates.

We also look forward to further discussion through the CA/B Forum at the appropriate time on the current revocation timelines and related requirements to balance the needs of the web ecosystem against the realities for Subscribers. More details on these action items, including timelines for completion, will be supplied in our forthcoming report to Mozilla and the Bugzilla community.

Ngook,
You have joined others from Entrust in simply ignoring the extensive commentary on your open bugs from the community about your CA’s willful ignorance of revocation requirements. It is painful to watch Entrust continually pretending it hasn’t read the comments that do not support its narrative.

So I will make this comment abundantly clear, and I will include questions to make it unambiguous that an on point reply is expected from Entrust within seven days.

This is one of five delayed revocation bugs Entrust has open right now. So far as I can see, Entrust has neglected in four of these five bugs to provide action items that deal with the failure to revoke on time. I will remind you once again of what Entrust repeatedly has been reminded of on this forum, which is that action items addressing the original cause of misissuance are not action items addressing the failure to revoke. Rather, in a delayed revocation bug, the community expects the CA to provide action items that address the specific causes of this specific failure, that look not just at surface level causes but also at true root causes, and that can reasonably be expected to prevent this error in the future. Repeated failures of the same nature will be looked upon unfavorably, as they reveal a CA that is failing to learn from its and the community’s mistakes and continually improve.

Entrust created the first of its five open bugs regarding failure to revoke correctly on March 20, 2024 for its refusal to meet BR requirements for an incident that it knew about on March 4. That report did not contain action items that could reasonably be expected to prevent repetition of this problem, and 41 comments later the thread still does not. Four more failed revocation bugs followed since then, and only the last of them, this one, contains any action items that address the root cause of the failure to revoke. As this bug has a different root cause than the other four, the analysis and action items in comment 1 do not contribute to resolution of the repeated problem exemplified in the other four bugs.

I will further remind the community of what you also are well aware of, which is that the Subscriber’s inability to manage basic certificate operations with resilience and agility is no excuse for the CA’s failure to follow required revocation behaviors detailed in the Baseline Requirements.

With all this background, it is noteworthy that Entrust continues not to provide clear, concrete action items to solve this ongoing deficiency. Your points 1.a and 1.b above relate to original misissuance and not the failure to revoke. Your point 2.a begins with the premise that you will be conducting late revocations again in the future, which is antithetical to the point of this entire exercise. Your point 3.a is about encouraging Subscriber resiliency and as such does not address Entrust’s failure in this and its other late revocation incidents. And Entrust is yet to provide appropriate action items for your four other, open, failed revocation incidents.

One possible conclusion is that Entrust has no intention ever to obey mandated revocation timelines , and therefore that Entrust will continue simply to refuse to commit to appropriate action items . I hope that is not the case.

If that is not the case, then Entrust should list a set of genuine, concrete, measurable action items that convincingly stand to rectify the CA’s recidivism and ensure no new delayed revocation bugs ever. I will remind you once again that such action items are not actions to prevent or minimize misissuance and are not to address Subscribers’ unwillingness or inability to swap out certificates in time spans shorter than 120 hours. The action items I describe here will be changes Entrust will make and be held accountable for that will ensure, regardless of any Subscriber situation or behavior whatsoever, that any misissued certificate at any time in the future will be revoked within 120 or 24 hours, in any volume up to and including Entrust’s entire corpus of active certificates.

So now, with all that preamble, here are my questions.

Question 1: Does Entrust intend in the future to obey mandated revocation timelines? Please provide a definitive and clear answer, avoiding opaque noncommittal language.

As a Certification Authority, we take seriously the requirements set by the CA/Browser Forum and the root programs and intend to comply with them. In this we are guided by the TLS Baseline Requirements and Mozilla’s Responding to an Incident.

Note that this bug is in regard to delayed revocation for the Jurisdiction issue; the delay was due to errors we made in not escalating the issue earlier, as outlined here. Once the issue was confirmed, the affected certificates in this bug were revoked within five days.

Question 2: If the answer to question 1 is no, why not?

See our response to Question 1.

Question 3: Does Entrust intend to provide relevant action items for bug 1887705, bug 1886532, bug 1890685, and bug 1890898?

The report we shared to the Mozilla community listed relevant action items for all of our incidents, which are being tracked together on bug 1901270. As we continue our development of plans based on these action items and review the feedback on this report, we will provide additional relevant detail to further address community questions and commentary.

The action item section in bug 1887705 refers to actions identified in bug 1886467 and bug 1886532; we can add them to bug 1887705 directly.

Regarding bug 1890685 and bug 1890898, on review we determined that these were mistakenly declared mis-issuances. See the updates to these bugs.

We also think it is important to share with the community what we have learned from these incidents how subscribers deploy certificates in their environments. Our investigation makes it clear that education and automation are insufficient. While many use cases for large global business companies require the use of publicly rooted certificates, in the payment’s ecosystem, as an example, where the distributed ecosystem and operating standards don't allow five-day revocation -- putting different requirements into conflict. Also most large businesses already have automation and use CLMs to manage their certificate estates and still found it impossible to revoke and renew within a five-day window.

We are actively engaging in discussions about use of private trust certificates for use cases where customers have demonstrated challenges meeting the requirements, and automation for customers required to deploy publicly rooted certificates amid conflicting standards.

We are committing to identifying opportunities to build resilience into our subscribers' certificate use and to work with the CA/Browser Forum to support industry efforts to resolve conflicts that may raise risk for the web ecosystem.

Question 4: If the answer to question 3 is no, why not?

N/A

Question 5: If the answer to question 3 is yes, when will Entrust commit to providing those action items? I will remind you before you answer this question that appropriate action items should have been available nearly two months ago.

As noted above, we identified action items in our report to Mozilla posted on June 7 and are tracking action items on Bugzilla.

We can also add additional action items to the incident report specifically in bug 1887705. As we continue our reviews and investigations based on these action items and feedback to the report we submitted to the Mozilla community on June 7, we plan to provide addenda that share additional relevant detail to further address community feedback.

Question 6: Does Entrust understand the criteria for what is an appropriate action item for a failed revocation bug and what is not? If the answer is yes, when did Entrust first come to this understanding?

As noted in question 1 above, we agree that it is critical for CAs to be prepared to follow revocation requirements in the TLS Baseline Requirements and we intend to follow the rules set out by the trusted root programs.

Question 7: What is the reason that Entrust has failed for such a long time to provide appropriate action items for the four bugs listed in question 3? Please be specific and detailed in your response.

See our response to question #3.

Question 8: What is the reason that, after receiving repeated comments on multiple incidents clarifying what is and is not appropriate as an action item for a failed revocation bug, Entrust has as late as June 1 continued to offer inappropriate action items while failing to offer appropriate action items for its failed revocation bugs? Please be specific and detailed in your response.

For the two failed revocation bugs, we issued an incident report based on our what we had understood, but continued to investigate the issue and its root causes.

In doing this and reviewing our own CPS, we realized that when the CPS is read as a whole, our intention to comply with the CA/Browser Forum certificate profiles -- in the TLS BRs and EV Guidelines -- is clearly communicated, and the CPS directs us to prioritize these over any inconsistent provision of the CPS.

As a result, we concluded that we had originally mis-characterized this incident as a mis-issuance. See bug 1890685, comment 29, and bug 1890898, comment 28

Question 9: Considering that in the cases of bug 1890685 and bug 1890898 large numbers of certificates remain unrevoked, does Entrust intend to apply any action items immediately to conclude these failed revocation events?

As noted above, we mistakenly declared these certificates mis-issued.

Flags: needinfo?(ngook.kong)

(In reply to ngook.kong from comment #21)

Question 1: Does Entrust intend in the future to obey mandated revocation timelines? Please provide a definitive and clear answer, avoiding opaque noncommittal language.

As a Certification Authority, we take seriously the requirements set by the CA/Browser Forum and the root programs and intend to comply with them. In this we are guided by the TLS Baseline Requirements and Mozilla’s Responding to an Incident.

This was a Yes or No question. Please provide an answer that is 'Yes' or 'No'.

Question 2: If the answer to question 1 is no, why not?

See our response to Question 1.

No answer was provided to question 1. Please this again in a transparent fashion.

Question 3: Does Entrust intend to provide relevant action items for bug 1887705, bug 1886532, bug 1890685, and bug 1890898?

The report we shared to the Mozilla community listed relevant action items for all of our incidents, which are being tracked together on bug 1901270. As we continue our development of plans based on these action items and review the feedback on this report, we will provide additional relevant detail to further address community questions and commentary.

The action item section in bug 1887705 refers to actions identified in bug 1886467 and bug 1886532; we can add them to bug 1887705 directly.

Regarding bug 1890685 and bug 1890898, on review we determined that these were mistakenly declared mis-issuances. See the updates to these bugs.

I would gather from this response that the answer is 'No'?

We also think it is important to share with the community what we have learned from these incidents how subscribers deploy certificates in their environments. Our investigation makes it clear that education and automation are insufficient. While many use cases for large global business companies require the use of publicly rooted certificates, in the payment’s ecosystem, as an example, where the distributed ecosystem and operating standards don't allow five-day revocation -- putting different requirements into conflict. Also most large businesses already have automation and use CLMs to manage their certificate estates and still found it impossible to revoke and renew within a five-day window.

We are actively engaging in discussions about use of private trust certificates for use cases where customers have demonstrated challenges meeting the requirements, and automation for customers required to deploy publicly rooted certificates amid conflicting standards.

We are committing to identifying opportunities to build resilience into our subscribers' certificate use and to work with the CA/Browser Forum to support industry efforts to resolve conflicts that may raise risk for the web ecosystem.

I don't know why Entrust felt the need to deliver a sales pitch here. However I will note that Entrust have said they have subscribers that "found it impossible to revoke and renew within a five-day window", and have re-issued certificates to them.

The only risk I can see to the web ecosystem so far are that Entrust are not performing their duties as a Certificate Authority. It is disturbing that this many months in and after a report that was supposed to address these issues we are still no further forward with regards to internal reflection. Please do the community a favor and answer these questions in a simple, direct, and honest fashion.

Question 4: If the answer to question 3 is no, why not?

N/A

Why is this N/A? I can see no statement confirming that Entrust will add additional action items to those issues.

Question 5: If the answer to question 3 is yes, when will Entrust commit to providing those action items? I will remind you before you answer this question that appropriate action items should have been available nearly two months ago.

As noted above, we identified action items in our report to Mozilla posted on June 7 and are tracking action items on Bugzilla.

Okay, so Entrust believes they answer 'Yes' to question 3? Can Entrust clarify which action items fit the incidents provided in Question 3, and when they will be resolved?

We can also add additional action items to the incident report specifically in bug 1887705. As we continue our reviews and investigations based on these action items and feedback to the report we submitted to the Mozilla community on June 7, we plan to provide addenda that share additional relevant detail to further address community feedback.

There seems to be a misunderstanding, over a month was provided to produce a report. Are Entrust proposing that they are planning to rewrite their answers from that report, similar to recent incidents? Any and all relevant information should have been provided to show how transparent and engaged Entrust are in addressing incidents.

Question 6: Does Entrust understand the criteria for what is an appropriate action item for a failed revocation bug and what is not? If the answer is yes, when did Entrust first come to this understanding?

As noted in question 1 above, we agree that it is critical for CAs to be prepared to follow revocation requirements in the TLS Baseline Requirements and we intend to follow the rules set out by the trusted root programs.

Given that wasn't the question posed at all, I would presume the answer is 'No'? This is about action items. I can't grasp at what level of misunderstanding is going on here that revocation requirements got mentioned. Can Entrust clarify this at all?

Question 7: What is the reason that Entrust has failed for such a long time to provide appropriate action items for the four bugs listed in question 3? Please be specific and detailed in your response.

See our response to question #3.

Can Entrust show where this question was answered in Question 3?

Question 8: What is the reason that, after receiving repeated comments on multiple incidents clarifying what is and is not appropriate as an action item for a failed revocation bug, Entrust has as late as June 1 continued to offer inappropriate action items while failing to offer appropriate action items for its failed revocation bugs? Please be specific and detailed in your response.

For the two failed revocation bugs, we issued an incident report based on our what we had understood, but continued to investigate the issue and its root causes.

In doing this and reviewing our own CPS, we realized that when the CPS is read as a whole, our intention to comply with the CA/Browser Forum certificate profiles -- in the TLS BRs and EV Guidelines -- is clearly communicated, and the CPS directs us to prioritize these over any inconsistent provision of the CPS.

As a result, we concluded that we had originally mis-characterized this incident as a mis-issuance. See bug 1890685, comment 29, and bug 1890898, comment 28

Please read the question - it is about action items. Can Entrust answer the question as written, or if not can they rephrase what they perceived the question to be?

Question 9: Considering that in the cases of bug 1890685 and bug 1890898 large numbers of certificates remain unrevoked, does Entrust intend to apply any action items immediately to conclude these failed revocation events?

As noted above, we mistakenly declared these certificates mis-issued.

I appreciate there seems to be a complete breakdown in reading comprehension lately, however this question was about Action Items. Please answer the question that was provided.

Flags: needinfo?(ngook.kong)

(In reply to Wayne from comment #18)

(In reply to Bruce Morton from comment #17)

Once we confirmed the incident, we revoked all affected certificates within five days as required. However, we should have detected and escalated the issue for investigation and confirmation prior to the date on which it was escalated. As a result, we filed a delayed revocation report. Once subscribers were notified of the issue, however, all affected certificates were revoked within the five-day timeline. We hope this further explanation helps to clear any confusion you may have around the filing of the report.

Frankly it doesn't. If no 'subscribers' requested delayed revocation, and no certificates have been identified as outwith the 5-day window by Entrust then what is this incident covering? Entrust has raised an incident then claimed nothing occurred, so what is the purpose of this incident?

The reason I asked for a per-subscriber breakdown is because as part of this being a delayed revocation event there needs to have been any certificates that went past the threshold of 5-days by the CA's definition. What information do we have to work with on that front?

Notably Entrust have overlooked answering Comment 11 in this incident.

We stated our reasons for filing this report above.

Flags: needinfo?(ngook.kong)

(In reply to ngook.kong from comment #23)

(In reply to Wayne from comment #18)

(In reply to Bruce Morton from comment #17)

Once we confirmed the incident, we revoked all affected certificates within five days as required. However, we should have detected and escalated the issue for investigation and confirmation prior to the date on which it was escalated. As a result, we filed a delayed revocation report. Once subscribers were notified of the issue, however, all affected certificates were revoked within the five-day timeline. We hope this further explanation helps to clear any confusion you may have around the filing of the report.

Frankly it doesn't. If no 'subscribers' requested delayed revocation, and no certificates have been identified as outwith the 5-day window by Entrust then what is this incident covering? Entrust has raised an incident then claimed nothing occurred, so what is the purpose of this incident?

The reason I asked for a per-subscriber breakdown is because as part of this being a delayed revocation event there needs to have been any certificates that went past the threshold of 5-days by the CA's definition. What information do we have to work with on that front?

Notably Entrust have overlooked answering Comment 11 in this incident.

We stated our reasons for filing this report above.

Could you refer me to these reasons? In this 'delayed revocation' incident can you confirm the following facts as Entrust understands them:

  • 0 certificates are outwith the 5-day revocation timeframe
  • 0 subscribers requested revocation

You are understanding that this is a 'delayed revocation' incident that covers zero certificates, right? If the intent is to correct the mis-issuance date and then have more than 0 certificates, then commit to one answer. Unless there will be a investigation in a few months to correct Entrust's understanding of the situation?

Flags: needinfo?(ngook.kong)

(In reply to JR Moir from comment #19)

Entrust: There are no replies to the questions from Tim Callan in https://bugzilla.mozilla.org/show_bug.cgi?id=1898848#c11

Why are you still unable to keep basic time-lines and answer questions within required 7 days?

(That is also a question for you to explain why you can't answer in 7 day, and also answer Hr. Callan all questions.)

On the response time ... We have been working diligently to address the high volume of comments received on these incidents.

Many comments are long expressions of opinion that offer personal characterizations of these incidents. While we don't always agree, we have opted not to argue - everyone has a right to their opinions. We have corrected these statements when appropriate.

We have spent a great deal of time addressing questions that seem by their tone and style to be more interrogations than attempts to learn something new that would be helpful to the community. Other questions -- even those attempting to express clarity -- are confusing and require a great deal of backtracking to determine what is actually being asked. This has led us to put greater scrutiny on our responses that we otherwise would.

Flags: needinfo?(ngook.kong)

(In reply to amir from comment #20)

Can you please explain the process you use to keep track of your time requirements for answering these questions?

I’m really curious if there is any formal method of tracking these because based off of how often you’ve missed the deadline on them it really seems like there’s no actual process.

Yes we do have a formal methodology to track and respond, We are tracking every question by date of posting and working to respond to each one within seven days. We do understand that we can respond simply by acknowledging the question and noting a later date when we will respond; we will work to ensure that we do this for all responses that may run over seven days. We would benefit from your and the community’s feedback if there are better tools and processes available or being used.

(In reply to Wayne from comment #22)

(In reply to ngook.kong from comment #21)

Question 1: Does Entrust intend in the future to obey mandated revocation timelines? Please provide a definitive and clear answer, avoiding opaque noncommittal language.

As a Certification Authority, we take seriously the requirements set by the CA/Browser Forum and the root programs and intend to comply with them. In this we are guided by the TLS Baseline Requirements and Mozilla’s Responding to an Incident.

This was a Yes or No question. Please provide an answer that is 'Yes' or 'No'.

We acknowledge your question, and the response continues to be Yes, as a CA, we take seriously the requirements set by the CA/Browser Forum and the root programs and intend to comply with them. In this we are guided by the TLS Baseline Requirements and Mozilla’s Responding to an Incident.

Question 2: If the answer to question 1 is no, why not?

See our response to Question 1.

No answer was provided to question 1. Please this again in a transparent fashion.

See our above answer to Question 1.

Question 3: Does Entrust intend to provide relevant action items for bug 1887705, bug 1886532, bug 1890685, and bug 1890898?

The report we shared to the Mozilla community listed relevant action items for all of our incidents, which are being tracked together on bug 1901270. As we continue our development of plans based on these action items and review the feedback on this report, we will provide additional relevant detail to further address community questions and commentary.

The action item section in bug 1887705 refers to actions identified in bug 1886467 and bug 1886532; we can add them to bug 1887705 directly.

Regarding bug 1890685 and bug 1890898, on review we determined that these were mistakenly declared mis-issuances. See the updates to these bugs.

I would gather from this response that the answer is 'No'?

The action items for these bugs are contained in the report provided to the MDSP thread and in the posted action items list in bug 1901270. It should be easy for you to tie the action items to the specific bug numbers because that is how the report is organized.

We also think it is important to share with the community what we have learned from these incidents how subscribers deploy certificates in their environments. Our investigation makes it clear that education and automation are insufficient. While many use cases for large global business companies require the use of publicly rooted certificates, in the payment’s ecosystem, as an example, where the distributed ecosystem and operating standards don't allow five-day revocation -- putting different requirements into conflict. Also most large businesses already have automation and use CLMs to manage their certificate estates and still found it impossible to revoke and renew within a five-day window.

We are actively engaging in discussions about use of private trust certificates for use cases where customers have demonstrated challenges meeting the requirements, and automation for customers required to deploy publicly rooted certificates amid conflicting standards.

We are committing to identifying opportunities to build resilience into our subscribers' certificate use and to work with the CA/Browser Forum to support industry efforts to resolve conflicts that may raise risk for the web ecosystem.

I don't know why Entrust felt the need to deliver a sales pitch here. However I will note that Entrust have said they have subscribers that "found it impossible to revoke and renew within a five-day window", and have re-issued certificates to them.

The only risk I can see to the web ecosystem so far are that Entrust are not performing their duties as a Certificate Authority. It is disturbing that this many months in and after a report that was supposed to address these issues we are still no further forward with regards to internal reflection. Please do the community a favor and answer these questions in a simple, direct, and honest fashion.

Question 4: If the answer to question 3 is no, why not?

N/A

Why is this N/A? I can see no statement confirming that Entrust will add additional action items to those issues.

The action items for these bugs are contained in the report provided to the MDSP thread and in the posted action items list in bug 1901270. It should be easy for you to tie the action items to the specific bug numbers because that is how the report is organized.

Question 5: If the answer to question 3 is yes, when will Entrust commit to providing those action items? I will remind you before you answer this question that appropriate action items should have been available nearly two months ago.

As noted above, we identified action items in our report to Mozilla posted on June 7 and are tracking action items on Bugzilla.

Okay, so Entrust believes they answer 'Yes' to question 3? Can Entrust clarify which action items fit the incidents provided in Question 3, and when they will be resolved?

The action items for these bugs are contained in the report provided to the MDSP thread and in the posted action items list in bug 1901270. It should be easy for you to tie the action items to the specific bug numbers because that is how the report is organized. There are also timelines associated with each action item detailed in the report.

We can also add additional action items to the incident report specifically in bug 1887705. As we continue our reviews and investigations based on these action items and feedback to the report we submitted to the Mozilla community on June 7, we plan to provide addenda that share additional relevant detail to further address community feedback.

There seems to be a misunderstanding, over a month was provided to produce a report. Are Entrust proposing that they are planning to rewrite their answers from that report, similar to recent incidents? Any and all relevant information should have been provided to show how transparent and engaged Entrust are in addressing incidents.

We have received a lot of feedback on the report, including from you, on the MDSP thread and we plan to address that feedback. Part of that response will include providing additional detail as requested on certain action items.

Question 6: Does Entrust understand the criteria for what is an appropriate action item for a failed revocation bug and what is not? If the answer is yes, when did Entrust first come to this understanding?

As noted in question 1 above, we agree that it is critical for CAs to be prepared to follow revocation requirements in the TLS Baseline Requirements and we intend to follow the rules set out by the trusted root programs.

Given that wasn't the question posed at all, I would presume the answer is 'No'? This is about action items. I can't grasp at what level of misunderstanding is going on here that revocation requirements got mentioned. Can Entrust clarify this at all?

Yes we understand the criteria. We didn’t quite understand the relevance of your second question and assume it is about delayed revocation hence the response that we have included action items in our June 7 report to address delayed revocation. Those action items can be found under the section entitled Delayed Revocation.

Question 7: What is the reason that Entrust has failed for such a long time to provide appropriate action items for the four bugs listed in question 3? Please be specific and detailed in your response.

See our response to question #3.

Can Entrust show where this question was answered in Question 3?

We were working on our report which was issued on June 7. The action items for the four bugs listed in Question 3 is contained in that report.

Question 8: What is the reason that, after receiving repeated comments on multiple incidents clarifying what is and is not appropriate as an action item for a failed revocation bug, Entrust has as late as June 1 continued to offer inappropriate action items while failing to offer appropriate action items for its failed revocation bugs? Please be specific and detailed in your response.

For the two failed revocation bugs, we issued an incident report based on our what we had understood, but continued to investigate the issue and its root causes.

In doing this and reviewing our own CPS, we realized that when the CPS is read as a whole, our intention to comply with the CA/Browser Forum certificate profiles -- in the TLS BRs and EV Guidelines -- is clearly communicated, and the CPS directs us to prioritize these over any inconsistent provision of the CPS.

As a result, we concluded that we had originally mis-characterized this incident as a mis-issuance. See bug 1890685, comment 29, and bug 1890898, comment 28

Please read the question - it is about action items. Can Entrust answer the question as written, or if not can they rephrase what they perceived the question to be?

We disagree that the action items that we have put forth to address delayed revocation are inappropriate action items.

Question 9: Considering that in the cases of bug 1890685 and bug 1890898 large numbers of certificates remain unrevoked, does Entrust intend to apply any action items immediately to conclude these failed revocation events?

As noted above, we mistakenly declared these certificates mis-issued.

I appreciate there seems to be a complete breakdown in reading comprehension lately, however this question was about Action Items. Please answer the question that was provided.

Action items have been detailed in our June 7 report to address and avoid CPS errors in the future. These CPS errors were not mis-issuances as detailed in our updated incident reports for both bug 1890685 and bug 1890898. We acknowledge recent posts on this topic and we are reviewing the points raised for action items.

(In reply to ngook.kong from comment #27)

We acknowledge your question, and the response continues to be Yes, as a CA, we take seriously the requirements set by the CA/Browser Forum and the root programs and intend to comply with them. In this we are guided by the TLS Baseline Requirements and Mozilla’s Responding to an Incident.

See our above answer to Question 1.

Thank you for answering my questions.

The action items for these bugs are contained in the report provided to the MDSP thread and in the posted action items list in bug 1901270. It should be easy for you to tie the action items to the specific bug numbers because that is how the report is organized.

I am specifically referring to bug 1890685 and bug 1890898 which are missing from your MDSP report. It should be easy for Entrust to notice these incidents are missing from their formal report.

Question 4: If the answer to question 3 is no, why not?

N/A

Why is this N/A? I can see no statement confirming that Entrust will add additional action items to those issues.

The action items for these bugs are contained in the report provided to the MDSP thread and in the posted action items list in bug 1901270. It should be easy for you to tie the action items to the specific bug numbers because that is how the report is organized.

Please read the question posed before copy and pasting answers that are not suitable.

Okay, so Entrust believes they answer 'Yes' to question 3? Can Entrust clarify which action items fit the incidents provided in Question 3, and when they will be resolved?

The action items for these bugs are contained in the report provided to the MDSP thread and in the posted action items list in bug 1901270. It should be easy for you to tie the action items to the specific bug numbers because that is how the report is organized. There are also timelines associated with each action item detailed in the report.

If Entrust's answers were at all clear these questions will not be being posed. Please establish where such action items exist for bug 1890685 and bug 1890898.

We can also add additional action items to the incident report specifically in bug 1887705. As we continue our reviews and investigations based on these action items and feedback to the report we submitted to the Mozilla community on June 7, we plan to provide addenda that share additional relevant detail to further address community feedback.

There seems to be a misunderstanding, over a month was provided to produce a report. Are Entrust proposing that they are planning to rewrite their answers from that report, similar to recent incidents? Any and all relevant information should have been provided to show how transparent and engaged Entrust are in addressing incidents.

We have received a lot of feedback on the report, including from you, on the MDSP thread and we plan to address that feedback. Part of that response will include providing additional detail as requested on certain action items.

As long as Entrust do not plan on changing their answers and will instead add in further detail I feel this sufficiently addresses the question. I look forward to a timely reply to the feedback presented in the MDSP thread.

Question 6: Does Entrust understand the criteria for what is an appropriate action item for a failed revocation bug and what is not? If the answer is yes, when did Entrust first come to this understanding?

As noted in question 1 above, we agree that it is critical for CAs to be prepared to follow revocation requirements in the TLS Baseline Requirements and we intend to follow the rules set out by the trusted root programs.

Given that wasn't the question posed at all, I would presume the answer is 'No'? This is about action items. I can't grasp at what level of misunderstanding is going on here that revocation requirements got mentioned. Can Entrust clarify this at all?

Yes we understand the criteria. We didn’t quite understand the relevance of your second question and assume it is about delayed revocation hence the response that we have included action items in our June 7 report to address delayed revocation. Those action items can be found under the section entitled Delayed Revocation.

To emphasize the lack of understanding present: the question was specifically about a failed revocation bug. I feel Entrust have made their level of understanding clear at this point, however if they wish to provide an answer to the question later they are free to do so.

Question 7: What is the reason that Entrust has failed for such a long time to provide appropriate action items for the four bugs listed in question 3? Please be specific and detailed in your response.

See our response to question #3.

Can Entrust show where this question was answered in Question 3?

We were working on our report which was issued on June 7. The action items for the four bugs listed in Question 3 is contained in that report.

Unfortunately the action items for the four bugs listed in Question 3 are not contained in that report. To reiterate this is specifically talking about bug 1890685 and bug 1890898.

As a result, we concluded that we had originally mis-characterized this incident as a mis-issuance. See bug 1890685, comment 29, and bug 1890898, comment 28

Please read the question - it is about action items. Can Entrust answer the question as written, or if not can they rephrase what they perceived the question to be?

We disagree that the action items that we have put forth to address delayed revocation are inappropriate action items.

To reiterate these are not delayed revocation incidents they are failure to revoke. We have no action items on these incidents to date.

Question 9: Considering that in the cases of bug 1890685 and bug 1890898 large numbers of certificates remain unrevoked, does Entrust intend to apply any action items immediately to conclude these failed revocation events?

As noted above, we mistakenly declared these certificates mis-issued.

I appreciate there seems to be a complete breakdown in reading comprehension lately, however this question was about Action Items. Please answer the question that was provided.

Action items have been detailed in our June 7 report to address and avoid CPS errors in the future. These CPS errors were not mis-issuances as detailed in our updated incident reports for both bug 1890685 and bug 1890898. We acknowledge recent posts on this topic and we are reviewing the points raised for action items.

Action items in reference to those bugs are not in the report.

Flags: needinfo?(ngook.kong)

(In reply to Wayne from comment #24)

(In reply to ngook.kong from comment #23)

Could you refer me to these reasons? In this 'delayed revocation' incident can you confirm the following facts as Entrust understands them:

  • 0 certificates are outwith the 5-day revocation timeframe
  • 0 subscribers requested revocation

You are understanding that this is a 'delayed revocation' incident that covers zero certificates, right? If the intent is to correct the mis-issuance date and then have more than 0 certificates, then commit to one answer. Unless there will be a investigation in a few months to correct Entrust's understanding of the situation?

This would not be correct. By filing this incident, we are saying that the 74 certificates affected were revoked after the 5-day deadline from when we should have opened the incident. No customers requested this delay, since they weren’t aware of it until we confirmed the incident and notified them. All certificates were revoked within five days of this notification.

Again, we agree that it would have been easier to not file this bug, and by your analysis it does not seem to fit the standard definition of a delayed revocation incident.

Right, the essence of my point is that this does not fit any standard definition of a delayed revocation incident. That is mainly why I was questioning what the incident even truly covers. The point, as I understand it, of a delayed revocation incident is to identify the segment of certificates that are still live from a related incident. Any and all updates are then to following on revocation and expiration if appropriate.

What date does Entrust consider it to be for when they should have opened the incident? This inherent contradiction is one that I'm quite puzzled by.

(In reply to Wayne from comment #28)

(In reply to ngook.kong from comment #27)

We acknowledge your question, and the response continues to be Yes, as a CA, we take seriously the requirements set by the CA/Browser Forum and the root programs and intend to comply with them. In this we are guided by the TLS Baseline Requirements and Mozilla’s Responding to an Incident.

See our above answer to Question 1.

Thank you for answering my questions.

The action items for these bugs are contained in the report provided to the MDSP thread and in the posted action items list in bug 1901270. It should be easy for you to tie the action items to the specific bug numbers because that is how the report is organized.

I am specifically referring to bug 1890685 and bug 1890898 which are missing from your MDSP report. It should be easy for Entrust to notice these incidents are missing from their formal report.

We have provided specific action items relating to those bugs in the Action Items section of the revised incident reports on each of those bugs.

Question 4: If the answer to question 3 is no, why not?

N/A

Why is this N/A? I can see no statement confirming that Entrust will add additional action items to those issues.

The action items for these bugs are contained in the report provided to the MDSP thread and in the posted action items list in bug 1901270. It should be easy for you to tie the action items to the specific bug numbers because that is how the report is organized.

Please read the question posed before copy and pasting answers that are not suitable.

The question posed was “If the answer to question 3 is no, why not?” The answer to question 3 was not no hence the N/A.

Okay, so Entrust believes they answer 'Yes' to question 3? Can Entrust clarify which action items fit the incidents provided in Question 3, and when they will be resolved?

The action items for these bugs are contained in the report provided to the MDSP thread and in the posted action items list in bug 1901270. It should be easy for you to tie the action items to the specific bug numbers because that is how the report is organized. There are also timelines associated with each action item detailed in the report.

If Entrust's answers were at all clear these questions will not be being posed. Please establish where such action items exist for bug 1890685 and bug 1890898.

The confusion may arise because you are asking about bugs that are not the current bug in which you are posting. In any event, the action items for bug 1890685 are found in the revised final incident report posted in Comment 29 on that bug; and the action items for bug 1890898 are found in the revised final incident report posted in Comment 28 on that bug.

Question 7: What is the reason that Entrust has failed for such a long time to provide appropriate action items for the four bugs listed in question 3? Please be specific and detailed in your response.

See our response to question #3.

Can Entrust show where this question was answered in Question 3?

We were working on our report which was issued on June 7. The action items for the four bugs listed in Question 3 is contained in that report.

Unfortunately the action items for the four bugs listed in Question 3 are not contained in that report. To reiterate this is specifically talking about bug 1890685 and bug 1890898.

As a result, we concluded that we had originally mis-characterized this incident as a mis-issuance. See bug 1890685, comment 29, and bug 1890898, comment 28

Please read the question - it is about action items. Can Entrust answer the question as written, or if not can they rephrase what they perceived the question to be?

We disagree that the action items that we have put forth to address delayed revocation are inappropriate action items.

To reiterate these are not delayed revocation incidents they are failure to revoke. We have no action items on these incidents to date.

The action items for bug 1890685 are found in the revised final incident report posted in Comment 29 on that bug; and the action items for bug 1890898 are found in the revised final incident report posted in Comment 28 on that bug.

Flags: needinfo?(ngook.kong)

(In reply to Wayne from comment #30)

Right, the essence of my point is that this does not fit any standard definition of a delayed revocation incident. That is mainly why I was questioning what the incident even truly covers. The point, as I understand it, of a delayed revocation incident is to identify the segment of certificates that are still live from a related incident. Any and all updates are then to following on revocation and expiration if appropriate.

What date does Entrust consider it to be for when they should have opened the incident? This inherent contradiction is one that I'm quite puzzled by.

We believe the primary purpose of any incident report is to discuss the underlying causes for what went wrong, as well as what is being done to remediate those underlying causes. In the case of a typical delayed revocation, the bug would normally be the place for the CA to explain why they had decided to delay revocation on a per subscriber basis, and also to share its analysis determining the factors that prevented timely revocation of the certificates, as well as remediation actions to prevent future revocation delays.

This is not a typical delayed revocation bug because we didn’t first identify a mis-issuance, notify subscribers, and then decide to delay revocation in response to specific requests. Again, the delay in this case was not due to a lack of willingness to revoke within the timelines, nor was it due to subscriber requests for delays or inability to safely revoke certificates within the timeframe. However, we did identify problems with our process that caused a delay, and we thought that this bug might be the appropriate way to transparently own up to those problems and to share our remediation actions. This bug is focused on the process issues that caused us not to revoke in a timely way, whereas Bug 1897630 is focused on the technical issue with the certificates, which had different causes and remediation actions.

We have not pinpointed a specific date when we should have declared a mis-issuance, opened the original incident or when we should have completed revocation. However, we believe that better processes should have allowed us to confirm and report the issue, notify subscribers and start revocation as least as early as April 4 or 5th, and possibly earlier. We are sharing these details with the community in transparency.

(In reply to Bruce Morton from comment #32)

We have not pinpointed a specific date when we should have declared a mis-issuance, opened the original incident or when we should have completed revocation. However, we believe that better processes should have allowed us to confirm and report the issue, notify subscribers and start revocation as least as early as April 4 or 5th, and possibly earlier. We are sharing these details with the community in transparency.

Given that this is a crucial aspect of any incident report... how have you not pinpointed a specific date? I'm not saying that there is not room for discussion on whether such a date should change and modify this incident. The part I'm questioning is what this incident at-this-moment-in-time is actually covering. An impacted certificate list in a delayed revocation event should not be talking about the original incident, but on the portion that were not revoked per the original incident reporting analysis.

How can there be an evaluation on adherence to delayed revocation, if no date exists or if such a date predicates that 0 certificates are impacted? I'm sure these are also questions on your side too, it is important we address these fundamental concepts. I'm concerned if this is going to be standard practice moving forward.

Flags: needinfo?(bruce.morton)

Action Items

Action Item Kind Due Date
Review applicable policy and procedures Prevent Done
Improve our internal problem reporting mechanism reported by internal staff Detect July 31, 2024
Reorganize product compliance and verification teams to provide additional organizational resources and oversight Prevent Done
Implement additional input validation controls for verification Mitigate July 26, 2024
Implement pkilint as post-issuance linter Detect Done
Flags: needinfo?(bruce.morton)

(In reply to Wayne from comment #33)

(In reply to Bruce Morton from comment #32)

We have not pinpointed a specific date when we should have declared a mis-issuance, opened the original incident or when we should have completed revocation. However, we believe that better processes should have allowed us to confirm and report the issue, notify subscribers and start revocation as least as early as April 4 or 5th, and possibly earlier. We are sharing these details with the community in transparency.

Given that this is a crucial aspect of any incident report... how have you not pinpointed a specific date? I'm not saying that there is not room for discussion on whether such a date should change and modify this incident. The part I'm questioning is what this incident at-this-moment-in-time is actually covering. An impacted certificate list in a delayed revocation event should not be talking about the original incident, but on the portion that were not revoked per the original incident reporting analysis.

How can there be an evaluation on adherence to delayed revocation, if no date exists or if such a date predicates that 0 certificates are impacted? I'm sure these are also questions on your side too, it is important we address these fundamental concepts. I'm concerned if this is going to be standard practice moving forward.

Fair point that we should be more definitive about declaring a specific date. On 2024-04-03 the issue was escalated and on 2024-04-04 the issue was confirmed. So, while we searched for the root cause we should have declared the mis-issuance and halted issuance on 2024-04-04.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: