Closed Bug 1879602 Opened 8 months ago Closed 2 months ago

Entrust: OCSP response signed with SHA-1

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bruce.morton, Assigned: bruce.morton)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36

Summary

After learning about OCSP Watch (https://sslmate.com/labs/ocsp_watch/), our Operations team identified that two root CA OCSP responders were signing using SHA-1. We immediately scheduled to correct the OCSP responders to sign with SHA-256.

Impact

OCSP responses providing the status for some CA certificates were signed with SHA-1 and not SHA-256 as required by the TLS BR 7.1.3.2.1.

Timeline

2024-02-03

  • 17:00 UTC - Operations reviewed OCSP Watch and viewed Entrust error with Entrust.net Certification Authority (2048) (https://crt.sh/?caid=32) root OCSP responder. Operations then viewed OCSP monitoring system and also discovered Entrust Root Certification Authority (https://crt.sh/?caid=99) root OCSP responder. No other errors were found.

2024-02-05:

  • 18:00 UTC – Confirmed authorization to fix the error.
  • 19:00 UTC – Tested fix in Staging environment and investigated the root cause.
  • 20:30 UTC – Submitted change request for approval.

2024-02-06

  • 21:00 UTC – Applied the fix to production.
  • 21:15 UTC – OCSP monitoring was updated to ensure correct OCSP responders sign with SHA-256.

Root Cause Analysis

Both root CAs have self-signed certificate signing using SHA-1. The OCSP responder default signs with the algorithm which was used to sign the CA certificate. A year before the OCSP SHA-1 sunset date, the offline root components were updated to sign with SHA-256. The online OCSP responders were not updated, so did not meet the eventual OCSP SHA-1 sunset date.

Lessons Learned

What went well

  • Since Entrust moved most CAs to minimum SHA-256, the default configuration setting did not provide SHA-1 signing in most instances.

What didn't go well

  • Configuration setting for all OCSP responders were not reviewed before the sunset date. The monitoring software checks for the signing algorithm remained the same to when monitoring was originally set. The monitoring software does not check against third party rules.

Where we got lucky

Action Items

Action Item Kind Due Date
OCSP responders reconfigured to sign with SHA-256 Correction 2024-02-06
OCSP responder monitoring configuration update to expect SHA-256 Detection 2024-02-06
Add OCSP Watch to daily monitoring Detection 2024-02-29

Will follow up is we decide to take any actions to address prevention of an error for a future ballot.

Appendix

Details of affected certificates

No certificates were affected.

Assignee: nobody → bruce.morton
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Summary: Entrust - OCSP response signed with SHA-1 → Entrust: OCSP response signed with SHA-1
Whiteboard: [ca-compliance] [ocsp-failure]
Type: defect → task

Monitoring using OCSP Watch was added on 2024-02-09. Action items listed have been completed.

The OCSP SHA-1 sunset date was 2022-06-01, over 1.5 years before this incident was detected. It sounds to me like the root cause here is not one of monitoring (the monitors were successfully confirming that OCSP was being signed with the expected algorithm) but one of human process: keeping track of requirements and sunset dates, being aware of all systems potentially affected by changing requirements, and re-confirming compliance with requirements on a regular basis. Does Entrust plan to take any action items to ensure that similar requirements dates are not missed in the future?

We do have a system to track ballots and effective dates. In this case, we were proactive to implement the change which was completed with our offline roots. The online OCSP responders were purposely delayed from the proactive tasks to continue to support some SHA-1 hierarchies. Unfortunately, we failed on having a reminder to stop the delay. We have updated the procedures for our Operational Authority to ensure requirement dates are not missed and ballot actions have closed when all tasks are complete..

The timeline should include the relevant dates for the action items an reviews or missed reviews that led up to the incident. For example, when were the the offline root components updated to sign with SHA-256, when was the date that the requirements came into effect, when was the first response that didn't follow the requirements and therefore the start of the incident.

Mozilla policy now requires CA's to perform a Compliance Self-Assessment annually. Had Entrust previously done a self-assessment either following the CCADB framework or your own? What steps does Entrust take to confirm current operations meet requirements?

Apologize for the delay, we will update the timeline for this incident.

(In reply to Mathew Hodson from comment #5)

The timeline should include the relevant dates for the action items an reviews or missed reviews that led up to the incident. For example, when were the the offline root components updated to sign with SHA-256, when was the date that the requirements came into effect, when was the first response that didn't follow the requirements and therefore the start of the incident.

Here is an update to the timeline:

Timeline (all times in UTC)

2020-09

  • Updated root and subordinate CAs to SHA-256 OCSP signing with some exceptions Entrust.net Certification Authority (2048) (https://crt.sh/?caid=32) and Entrust Root Certification Authority (https://crt.sh/?caid=99) for backwards compatibility.

2022-01-24

  • CA/Browser Forum ballot SC-53 Sunset for SHA-1 OCSP Signing was passed by voting.

2022-02-16

  • Policy Authority meeting discussed CA/Browser Forum ballots and status SC-53 as open.

2022-03

  • Updated OCSP signing algorithm for a private trust and for AffirmTrust Networking root the associated subordinate CA. Discovered the OCSP signing algorithm was not based on the delegated OCSP signing certificate, but on the signing algorithm associated with the CA certificate. The playbooks were updated, but the Entrust.net Certification Authority (2048) and Entrust Root Certification Authority were missed.

2022-05-18

  • Policy Authority meeting discussed CA/Browser Forum ballots and status SC-53 as closed.

2022-05-31

  • Compliance team confirmed we would no longer be using SHA-1 to sign OCSP responses. This was interpreted that Compliance was confirming the ballot requirements; however, implementation was not verified.

2022-06-01

  • 00:00 CA/Browser Forum ballot SC-53 stating “CAs MUST NOT sign OCSP responses using the SHA-1 hash algorithm” became effective as such, Entrust.net Certification Authority (2048) and Entrust Root Certification Authority were non-compliant.

2024-02-03

  • 17:00 - Operations reviewed OCSP Watch and viewed Entrust error with Entrust.net Certification Authority (2048) root OCSP responder. Operations then viewed OCSP monitoring system and also discovered Entrust Root Certification Authority root OCSP responder. No other errors were found.

2024-02-05:

  • 18:00 – Confirmed authorization to fix the error.
  • 19:00 – Tested fix in Staging environment and investigated the root cause.
  • 20:30 – Submitted change request for approval.

2024-02-06

  • 21:00 – Applied the fix to production.
  • 21:15 – OCSP monitoring was updated to ensure correct OCSP responders sign with SHA-256.

Mozilla policy now requires CA's to perform a Compliance Self-Assessment annually. Had Entrust previously done a self-assessment either following the CCADB framework or your own? What steps does Entrust take to confirm current operations meet requirements?

Entrust performs an annual self-assessment based on the CCADB framework. With regards to the TLS BRs, the framework ensures the policy is stated in the disclosed documents and is not a technical assessment. We also do similar assessments for requirements and policies not covered by the CCADB self-assessment, such as Apple policy, Microsoft policy and Adobe Approved Trust List (AATL) requirements.

Our Operations teams perform quality checks when changes are made, such as a ballot change. Monitoring would also be updated to continually many requirements.

The playbooks were updated, but the Entrust.net Certification Authority (2048) and Entrust Root Certification Authority were missed.

This seems like it should be the focus of your actual root cause analysis. Why were these CAs treated differently? What happens during your Policy Authority updates that could cause elision of whole hierarchies?

(In reply to honest_enteropneust from comment #8)

The playbooks were updated, but the Entrust.net Certification Authority (2048) and Entrust Root Certification Authority were missed.

This seems like it should be the focus of your actual root cause analysis. Why were these CAs treated differently?
Those Roots were threaded differently because they supported Code Signing and Time stamping subordinate CAs and Entrust was trying to extend support for SHA-1 of Legacy relying parties to the sunset date. The issue was some customers had code signed applications embedded in POE/Edge devices that could not easily be updated. The goal was to extend support for SHA-1 as close to the sunset date as possible; however, the exception was not closed before the deadline.

What happens during your Policy Authority updates that could cause elision of whole hierarchies?
The Policy Authority was purpose was to get a status of all open ballots and the status was deemed to be complete. We will follow up with an action for the operational authority to ensure the ballot update is complete, including any temporary exceptions.

We are adding the following action to ensure granted exceptions are addressed and ballot deadlines are followed.

Action Item Kind Due Date
Ticketing system for the PKI operational team will be updated to track exceptions and deadlines for ballots and browser policy compliance. All exceptions will be ticketed and tracked through completion. Prevention 2024-05-31

If there are no other comments, it is requested set the next update to 3 May 2024.

So, I ran a couple of very basic searches:

There have been multiple bugs open here in the past mentioning this tool. Why did it take you until 2024-02-03, to notice it? Am I right to assume that you do not have any procedures in place to monitor incidents involving other CAs to learn from, and apply those lessons learned to your own CA?

Once you noticed this issue, why did it take you until 2024-02-09 to file this incident?

An initial report should be filed within 72 hours of the CA Owner being made aware of the incident. If a full incident report is not yet ready, CA Owners should provide a preliminary report containing an executive summary of the incident and a date by which the full report will be posted.

Your timeline also fails to mention the 2024-02-09 date.

Based on this, I think there is another incident here for failing to provide the initial report within 72 hours.

(In reply to amir from comment #12)

Once you noticed this issue, why did it take you until 2024-02-09 to file this incident?

Based on this, I think there is another incident here for failing to provide the initial report within 72 hours.

Hi Amir, thank you for the questions. We will address and follow up.

Bruce.

Whiteboard: [ca-compliance] [ocsp-failure] → [ca-compliance] [ocsp-failure] Next update 2024-05-03

(In reply to amir from comment #12)

So, I ran a couple of very basic searches:

There have been multiple bugs open here in the past mentioning this tool. Why did it take you until 2024-02-03, to notice it? Am I right to assume that you do not have any procedures in place to monitor incidents involving other CAs to learn from, and apply those lessons learned to your own CA?

We do have procedures in place to monitor incidents from other CAs. Unfortunately, in the case of OCSP, we did not proactively test our endpoints with the listed tools as we considered those issues already addressed. Going forward, we will ensure that our operations team is informed about tools that have successfully identified issues for other CAs, enabling us to conduct testing ourselves and, whenever feasible, ongoing monitoring.

Once you noticed this issue, why did it take you until 2024-02-09 to file this incident?

An initial report should be filed within 72 hours of the CA Owner being made aware of the incident. If a full incident report is not yet ready, CA Owners should provide a preliminary report containing an executive summary of the incident and a date by which the full report will be posted.

Your timeline also fails to mention the 2024-02-09 date.

Based on this, I think there is another incident here for failing to provide the initial report within 72 hours.

After consulting with CCADB Support, we've clarified the necessary timeline for initial incident reports. While the CCADB Policy recommends filing incident reports within 24 hours, it doesn't mandate CAs to do so within 72 hours of the CA Owner's awareness of the incident. We've complied with CCADB and root program requirements by submitting a comprehensive incident report within two weeks. However, we acknowledge the value of providing an initial report promptly for transparency. To address this, we'll assess and establish guidelines for when to issue initial incident reports and incorporate them into our procedures.

We do have procedures in place to monitor incidents from other CAs.

That's great! Can you please provide the triage logs that Entrust did for the following incidents?

What I'm looking for in the logs:

  • Date that the incident was triaged
  • Findings from the triage process
  • If any action items came from the triage

After consulting with CCADB Support, we've clarified the necessary timeline for initial incident reports.

I'm trying to see where this discussion happened. I do not see any threads in https://groups.google.com/a/ccadb.org/g/public, nor do I see anything here: https://groups.google.com/a/mozilla.org/g/dev-security-policy

It would help future CAs to also know what the clear language from CCADB is, and for transparency sake it will help if we know who gave you such guidance.

The referenced language is from https://www.ccadb.org/cas/incident-report, which states in part:

An initial report should be filed within 72 hours of the CA Owner being made aware of the incident. If a full incident report is not yet ready, CA Owners should provide a preliminary report containing an executive summary of the incident and a date by which the full report will be posted. The full incident report must be posted within two weeks of the incident.

For transparency, I responded to Entrust as the on-rotation CCADB Support representative including the following (emphasis mine):

the CCADB Policy imposes no requirement upon CAs to file an initial report nor to do so within 72 hours of the CA Owner being made aware of the incident. However, this is intended as guidance which is compatible with, but not authoritative over and above, individual Root Store requirements of Root Store Operators participating in the CCADB.

This response limits itself to pertinent information from the guidance published by the CCADB as it is not the CCADB's goal nor purpose to mediate between CAs and individual Root Programs. Noting that, the CCADB Steering Committee very much welcomes community input (e.g. via Bugzilla: https://bugzilla.mozilla.org/enter_bug.cgi?product=CA+Program&component=Common+CA+Database).

Thanks Clint for the transparency in information. Personal opinion, but in the future I hope CAs try to have these conversations on the public mailing lists to reduce the confusion in such areas.

I'm not sure I agree with Entrust's assertion that this happened per the policies of root program requirements:

We've complied with CCADB and root program requirements by submitting a comprehensive incident report within two weeks.

Mozilla states

When a CA operator fails to comply with any requirement of this policy - whether it be a misissuance, a procedural or operational issue, or any other variety of non-compliance - the event is classified as an incident and MUST be reported to Mozilla as soon as the CA operator is made aware.

Emphasis: "as soon as the CA operator is made aware.", which, well, I guess depends on the definition of "as soon as", but I don't think ~6 days would fall into that definition.

Beyond this, there's really nothing on Mozilla's root program that states an incident should be made within 2 weeks, other than linking to https://www.ccadb.org/cas/incident-report

In that:

An initial report should be filed within 72 hours of the CA Owner being made aware of the incident. If a full incident report is not yet ready, CA Owners should provide a preliminary report containing an executive summary of the incident and a date by which the full report will be posted.

I read this as "An initial report SHOULD..." (RFC lingo). I know the BRs use SHOULD/MUST/MAY language as well, so maybe this page can also be drafted as such. But I do understand that this guidance is not meant to override what the root programs intend.

(In reply to amir from comment #15)

We do have procedures in place to monitor incidents from other CAs.

That's great! Can you please provide the triage logs that Entrust did for the following incidents?

What I'm looking for in the logs:

  • Date that the incident was triaged
  • Findings from the triage process
  • If any action items came from the triage

We do not have triage logs to share. Triage of the above incidents would not have triggered a concern related to our OCSP deprecation incident. We do agree that testing using OCSP Watch would have found our incident earlier.

We have added monitoring using OCSP Watch. We will also review our CA incident review process, and as stated above, we will ensure that our operations team is informed about tools that have successfully identified issues for other CAs.

If you do not have any logs for the triage, then how exactly are you monitoring Bugzilla for incidents?

Is there formality to the monitoring? How are new topics discovered and assigned?

(In reply to amir from comment #20)

If you do not have any logs for the triage, then how exactly are you monitoring Bugzilla for incidents?

Is there formality to the monitoring? How are new topics discovered and assigned?

We are monitoring Bugzilla through a module subscription similar to how we are registered to the Mozilla dev-security-policy mailing list and the CAADB public mailing list. When topics are discovered in monitoring, they are assigned for follow up as appropriate.

they are assigned for follow up as appropriate.

Were the topics I linked ever assigned out?

How do you make sure that the work of assigning out actually happens? Is there any system to followup to ensure a topic has been seen?

(In reply to amir from comment #22)

they are assigned for follow up as appropriate.

Were the topics I linked ever assigned out?

Of the bugs you posted, 4 of 5 were triaged and deemed as previously address or not necessary for further follow up.

The fifth -- https://bugzilla.mozilla.org/show_bug.cgi?id=1763203 -- was reviewed 25 May 2022, and should have been assigned out for investigation, but was not.

How do you make sure that the work of assigning out actually happens? Is there any system to follow up to ensure a topic has been seen?

If an incident has been triaged for investigation, then a request is made to the appropriate team and followed up by the compliance team.

We have a process to triage the incidents from other CAs, but currently do not have a programmatic system. We are investigating a new system which may help us to track the lifecycle, assignment and follow up of CA incidents.

We have no updates for this week and will continue to monitor the bug.

We have no updates for this week and will continue to monitor the bug.

Per comment #10, setting next action for May 31, 2024.

Whiteboard: [ca-compliance] [ocsp-failure] Next update 2024-05-03 → [ca-compliance] [ocsp-failure] Next update 2024-05-31
Action Item Kind Due Date
Ticketing system for the PKI operational team will be updated to track exceptions and deadlines for ballots and browser policy compliance. All exceptions will be ticketed and tracked through completion. Prevention Done

All actions are complete. Requesting to close this incident.

Whiteboard: [ca-compliance] [ocsp-failure] Next update 2024-05-31 → [ca-compliance] [ocsp-failure]

All actions are complete. Request that this incident is closed.

I'll close this on or about Friday, 19-July-2024.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.