Closed Bug 1649961 Opened 4 years ago Closed 4 years ago

Actalis: Incorrect OCSP Delegated Responder Certificate

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ryan.sleevi, Assigned: adriano.santoni)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

Attachments

(1 file)

2.02 KB, application/x-x509-ca-cert
Details

The following was originally reported to m.d.s.p. at https://www.mail-archive.com/dev-security-policy@lists.mozilla.org/msg13493.html

Actalis has issued one or more OCSP Delegated Responders, as defined within RFC 6960, Section 2.6 and Section 4.2.2.2, without including the id-pkix-ocsp-nocheck response, as required by the Baseline Requirements, Version 1, Section 13.2.5 through Version 1.7.0, Section 4.9.9

Example certificate: https://crt.sh/?id=1298407200

Please provide an incident report, including the timeline for revocation.

I have just checked, and it does not seem to me that the OCSP responder's certificate is lacking the id-pkix-ocsp-nocheck extension. Maybe I made the wrong test, but then I did not understand what the problem is.

Attached file ocsp_responder.cer

OCSP Responder's certificate.

The linked post explains more details.

The certificate linked is an OCSP Responder cert, despite that most likely not being what you meant to do. The issue exists with the certificate itself.

Flags: needinfo?(adriano.santoni)

Ryan, bear with me: I have read your post, but I still don't understand. We use delegated OCSP responders, so we have to comply with BR §4.9.9 which mandates inclusion of the id-pkix-ocsp-nocheck extension in the OCSP responder's certificate, and that's what we do. I have just tested using OpenSSL in the following way:

openssl ocsp -issuer ActalisAuthenticationRootCA.cer -cert 1298407200.crt -url http://ocsp05.actalis.it/VA/AUTH-ROOT -text

then I parsed the OCSP responder's certificate (just to make double sure), attached, and I found that the id-pkix-ocsp-nocheck extension is there. I am probably doing the wrong test, so please expolain me how to do the right one.

Adriano: The following certificate is a delegated OCSP responder: https://crt.sh/?id=1298407200

It does not have the id-pkix-ocsp-nocheck extension, but it has an id-kp-OCSPSigning EKU, making it a Delegated responder that is misissued.

It is that specific certificate, not the responder for that certificate, that is misissued. Because that certificate is a responder

We are investigating the issue.

This is our preliminary incident report.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

We became aware of this issue upon reading the following post to the MDSP mailing list, and the subsequent creation of this bug:

https://www.mail-archive.com/dev-security-policy@lists.mozilla.org/msg13493.html

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2020-07-01 14:05 (PDT) Post on MDSP by Ryan Sleevi (see link above) describing the issue.
2020-07-01 18:46 (PDT) This bug was created and assigned to us.
2020-07-01 23:57 (PDT) We started investigating the issue. We soon realized that all of the Actalis ICAs impacted by this issue are directly managed by us, and us only: their keys are kept inside HSMs located in our own data centers (no contractors) and are manned by our staff (no contractors), protected by a multitude of security measures at all layers (physical, technical, operational).
2020-07-02 07:46 (CEST) We held a first meeting to better frame the issue and better understand its implications. On the same day, we re-analised the applicable RFCs, Root CA programs, and CABF requirements, with regard to the presence of the EKU extension in CA certificates.
2020-07-03 14:42 (CEST) In a second meeting we decided, as our first measure, to create and deploy new ICAs to stop issuing certificates from the affected ICAs. We planned generation of new ICA keys by next Monday. (Over the weekend we carried on our analysis of the situation.)
2020-07-06 07:30 (CEST) We started generation of new ICA keys and corresponding certificates.
2020-07-06 17:00 (CEST) A special task force was formed for discussing our further measures, with special regard to the security risks, and the first meeting of this task force was held. We started analyzing in detail the actual security risk deriving from OCSPSign being included in the EKU extension of ICAs, and what controls we could adopt to mitigate the residual risk of our ICAs being abused.
2020-07-07 09:00 (CEST) We started configuration and testing of certificate status services (CRL and OCSP) for the new ICAs.
2020-07-08 07:45 (CEST) We started deploying the new ICAs and re-configuration of our internal CA-RA connectors to prepare for the switchover.

  1. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

We stopped in 2019 issuing ICA certificates with OCSPSign in their EKU extension.

  1. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

The following intermediate CA certificates are involved:

https://crt.sh/?id=1298407195&opt=cablint
https://crt.sh/?id=1298407200&opt=cablint
https://crt.sh/?id=1305420938&opt=cablint
https://crt.sh/?id=2029983391&opt=cablint
https://crt.sh/?id=3059419298&opt=cablint

All the above intermediate CA certificates were issued between 19-Mar-2019 and 20-Sep-2019.

Under three of the above five ICAs there are several thousands active OV and EV SSL Server certificates issued for critical services in sectors such as Banking, Central Government (such as Ministries), Healthcare (hospitals, testing laboratories, etc.), Telecom, Energy, National Defense, and others. There are also hundreds of thousands of DV certificates issued for a myriad of web sites of all sorts, including a big number of local government institutions. We can provide the figures later on.

  1. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

See #4.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

As it can be gathered from the discussion on MDSP, there was room for a different interpretation, between browsers and CAs, of the combination of the applicable RFCs, Root CA programs and CABF requirements, as regards the meaning of the EKU extension in CA certificates. Looking forward, we believe it necessary to introduce suitable clarifications in the BRs and/or in the Root CA programs.

  1. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

We acknowledge the risk involved by this issue. However we believe that risk to be negligible in our case, so we do not deem it necessary to destroy the affected ICAs within a short timeframe.

As mentioned in our preliminary analysis (see #2) we believe that the security inherent in the management of our own ICA keys is essentially the same as that of our Root CA key, so the only risk we acknowledge is that of abusing our ICA keys (for generating OCSP responses) by some member our Operations team, as they have administrative access to the servers comprising our CA infrastructure.

Our company is certified ISO 27001 and all the trust services we provide, also certified as eIDAS compliant, are subject to annual third-party audits according to recognized auditing criteria.

To resolve the compliance issue, we are already deploying new ICAs, with new keys. So in the next few days we will start issuing all new end-entity certificates from these new ICAs. At the same time, we are going to disable the certificate issuing functionality of the CA software instances that use the affected ICA keys. Only the CRL issuing functionality will be left active.

However, we understand that in this particular circumstance that may not be perceived by the community as a sufficient assurance that we will not "mess" with our own ICA keys. So we will share in the next few days the specific technical and operational measures we are putting in place to prevent abusing our ICA keys by our own personnel, with the same level of assurance that applies to our other key management processes. We are preparing a detailed technical explanation that will be corroborated by an independent assessment. Without going into the details that will be provided in the aforementioned document, in progress, below we list some of the controls that are already in place or that will soon be put in place, to mitigate the risk:

  • HSM configuration continuous monitoring - by our Security department (independent from Operations) through the AUDIT feature available on our HSM - Monitoring of the HSM configuration, in particular as regards the list of client IPs allowed to open connections with the HSM, log on and log off and system changes, triggering alarms in case an unexpected change is detected;
  • HSM signature continuous monitoring - by our Security department - Monitoring of the number of signatures made by the HSM hosting the dismissed ICA keys and comparison with the number of CRLs actually issued by the ICA software (which, after the switchover to the new ICAs, will be configured so to sign CRLs only); will also check the expected total number and time of signatures, and will be triggering alarms in case of unexpected events;
  • Root OCSP Responder continuous monitoring - by an external and independent organization with a SOC engaged by our Security department - Monitoring of the OCSP responses returned by our Root OCSP responder, to check that they are signed by the expected OCSP responder certificate (and not by an ICA), and triggering alarms in case of unexpected events;
  • IPS with packet inspection and dropping (still under verification) - based on an IPS system managed by our Network team (independent from Operations) - with the ability to drop OCSP responses if the certificate used to sign them was one of the affected ICAs.

We are already involving our external auditor for assessing our preliminary risk analysis and arranging for certification of the described controls.

(In reply to Adriano Santoni from comment #7)

Under three of the above five ICAs there are several thousands active OV and EV SSL Server certificates issued for critical services in sectors such as Banking, Central Government (such as Ministries), Healthcare (hospitals, testing laboratories, etc.), Telecom, Energy, National Defense, and others. There are also hundreds of thousands of DV certificates issued for a myriad of web sites of all sorts, including a big number of local government institutions. We can provide the figures later on.

As part of this, as covered in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation , you need to consider two useful things.

Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable.
You will perform an analysis to determine the factors that prevented timely revocation of the certificates, and include a set of remediation actions in the final incident report that aim to prevent future revocation delays.

The examples mention only focus on TLS certificates, and it's unclear if that's the full scope of impact. If it is, this response is deeply concerning, because there's been ample discussion about the expectations and need to replace TLS certificates quickly and easily. This also does not seem to match the expectation of providing the rationale on a "per-Subscriber basis.", as mentioned in that page.

Any delay of revocation needs to come up with a comprehensive plan to address this, such as automation, or a recognition that there will be no further delays. Actalis knows how critical this is, because of the significant delays encountered with Bug 1534295, which lead to Bug 1572638, and which promised changes going forward. I'm not sure how this is not simply a repeat of the incident, and suggests that none of the improved process was actually useful.

That's concerning, but if there's data being omitted, that's something y'all need to share.

Our company is certified ISO 27001 and all the trust services we provide, also certified as eIDAS compliant, are subject to annual third-party audits according to recognized auditing criteria.

Sure, but neither of these provide any guarantees for the problems/concerns mentioned. Anyone can be "audited" - the level of assurance rests on the strength of the audit scheme and the consistency and expertise of the auditors, which are globally-recognized areas of weakness for ETSI, even by ENISA itself. So you're right that it's not perceived as sufficient, and I'm glad you acknowledge that :)

  • Root OCSP Responder continuous monitoring - by an external and independent organization with a SOC engaged by our Security department - Monitoring of the OCSP responses returned by our Root OCSP responder, to check that they are signed by the expected OCSP responder certificate (and not by an ICA), and triggering alarms in case of unexpected events;
  • IPS with packet inspection and dropping (still under verification) - based on an IPS system managed by our Network team (independent from Operations) - with the ability to drop OCSP responses if the certificate used to sign them was one of the affected ICAs.

Note that, while mitigations, these aren't very robust mitigations, since an attacker can staple an OCSP response or otherwise intercept the request to the root responder in targeted cases. That is, while I think they're valuable and useful to demonstrate the steps Actalis is taking, I just want to acknowledge that they're weak-to-non-existent mitigations. It will be the demonstration of other mitigations as part of the report that I think will be more useful for Relying Parties.

We also created bug https://bugzilla.mozilla.org/show_bug.cgi?id=1651651 to track the delayed revocation.

We have fixed and approved our plan to revoke the 5 affected ICA certificates and destroy the related keys.

We have also tested and planned all the actions described in Comment #7 except for "IPS with packet inspection and dropping" that we abandoned because, as remarked by Ryan, that would be a weak mitigation action and would also negatively impact the overall performance of our networking infrastructure.

About the other mitigation actions, we have this plan:

HSM configuration continuous monitoring and HSM signature continuous monitoring:

We tested them succesfully and planned to have them fully deployed in production by July 27.

With these controls implemented, our security team will be able to check:

  • every 5 minutes, if there is a unplanned change of the HSM configuration, especially regarding the additions of new clients authorized to connect to the HSM;
  • every 24 hours, if the number of signatures done by the private keys of the five affected ICAs is different than the number of CRLs known to have been produced in that 24-hours interval.

Our auditor (CAB) was formally engaged with all the necessary documentation and the onsite audit is planned on the 30th and 31st of July. The inspection report is forecasted to be issued one week later, including all the technical details, analysis and performed tests.

Hereafter, a short technical description of the controls implemented and the related process; more details will be provided in the evaluation report of the auditor:

  • We exploit the strong audit capabilities of our HSMs (Thales Luna SA 5), with a specific audit profile reserved to our security team
  • Our security team has defined and approved the audit configuration in terms of level, details and destination of the HSM audit logs (this configuration can be changed only by the security team but they can achieve the HSM only with approval by operation team with a good segregation of duties)
  • The HSM audit logs will be collected on a machine under sole control of our security team
  • Two monitoring scripts have been developed that automatically checks the HSM audit logs and send alerts to our SOC console on detecting unwanted/unexpected events
  • In case of an alert, the security team will open a security incident using our internal normal procedure and immediately start to investigate
  • A daily report is also scheduled for the signatures done by the specific keys, for a human control.

Root OCSP Responder continuous monitoring

We have tested the procedure and are signing a contract with an external company that will be charged of executing it at regular intervals and triggering alarms in case of unexpected results. We plan to have this check in production by July 27.

Our external partner, from its own monitoring systems, will send an OCSP request to our Root OCSP Responder every 60 seconds and check that the response is signed by the expected OCSP Responder certificate and not by one of the affected ICAs. In case of an OCSP response not signed by the expected certificate, the partner SOC will send an alert to our SOC and the security team will open an incident and start investigating. This process will not be certified by our auditor as we believe that the assurance provided by using a 3rd party is enough for this type of check.

About the plan for reissuing certificates and eventually revoke the affected ICAs we have contacted all our customers and we are proceeding.

Considering the strong mitigation actions described and certified (we believe the residual risk of misuse of the ICAs private keys is negligible), the pandemic situation (we have many customers in the Healthcare sector) with many companies who have workers on standby, and the number of impacted end entity certificates (more than 430.000 to be reissued) we have defined, approved and we commit to this plan of revocations:

  • 80% by the 30th of August
  • 98% by the 30th of September
  • 99,99% by the 30th of October

Revocation and key destruction for all the affected ICAs is planned by the 6th of November, the exact date still to be confirmed as ICA keys destruction will be done in the presence of our auditor.

Next update will be posted after the verification report is issued by our auditor.

(In reply to Adriano Santoni from comment #10)

With these controls implemented, our security team will be able to check:

  • every 5 minutes, if there is a unplanned change of the HSM configuration, especially regarding the additions of new clients authorized to connect to the HSM;
  • every 24 hours, if the number of signatures done by the private keys of the five affected ICAs is different than the number of CRLs known to have been produced in that 24-hours interval.

Our auditor (CAB) was formally engaged with all the necessary documentation and the onsite audit is planned on the 30th and 31st of July. The inspection report is forecasted to be issued one week later, including all the technical details, analysis and performed tests.

Will the report be including these details? And I'm assuming this will report will be shared?

Root OCSP Responder continuous monitoring

We have tested the procedure and are signing a contract with an external company that will be charged of executing it at regular intervals and triggering alarms in case of unexpected results. We plan to have this check in production by July 27.

Our external partner, from its own monitoring systems, will send an OCSP request to our Root OCSP Responder every 60 seconds and check that the response is signed by the expected OCSP Responder certificate and not by one of the affected ICAs. In case of an OCSP response not signed by the expected certificate, the partner SOC will send an alert to our SOC and the security team will open an incident and start investigating. This process will not be certified by our auditor as we believe that the assurance provided by using a 3rd party is enough for this type of check.

Could you describe the scenario you believe is being defended against? I'm not sure I understand the value of this.

Considering the strong mitigation actions described and certified (we believe the residual risk of misuse of the ICAs private keys is negligible), the pandemic situation (we have many customers in the Healthcare sector) with many companies who have workers on standby, and the number of impacted end entity certificates (more than 430.000 to be reissued) we have defined, approved and we commit to this plan of revocations:

  • 80% by the 30th of August
  • 98% by the 30th of September
  • 99,99% by the 30th of October

This info belongs on the related delayed revocation bug, including the steps taken to reduce this risk in the future.

Whiteboard: [ca-compliance] → [ca-compliance] Next Update - 7-August 2020

Will the report be including these details? And I'm assuming this will report will be shared?

Yes, and Yes.

The inspection by our auditor was carried out on July 30 and 31. Their report is in preparation and will be shared soon.

Could you describe the scenario you believe is being defended against? I'm not sure I understand the value of this.

If an insider managed to abuse one of our ICA keys to sign one or more OCSP responses, those fake responses would include the ICA certificate itself in the BasicOCSPResponse.certs field. Otherwise, a normal OCSP user agent would not be able to validate them and therefore would not fall into the trap. So we are having an external agency continuously check that the certificate accompanying the OCSP responses from our Root OSCP responder is the expected one, otherwise an alarm is triggered. Even if an attacker wanted to use a fake OCSP response by stapling, this response would still be obtained via an OCSP query addressed to our Root OCSP responder, normally. At least, the most popular web server softwares, when configured for OCSP stapling, obtain OCSP responses through OCSP queries, as far as we know. There may exist loopholes and/or alternate ways of stapling OCSP responses, but they seem quite unlikely to us. There may also exist OCSP user agents that do not validate OCSP responses using the responder certificate included in them (in the BasicOCSPResponse.certs field), or that fail to do that correctly, but we do not aim at covering all possible situations with this control, which nonetheless we believe to be useful. We believe this control itself cannot easily be circumvented by an insider. We currently receive some 80 millions OCSP requests per day, targeting our Root OCSP Responder, from a large number of different (and varying) IP addresses, and our Operations team cannot tell the "real" OCSP requests (originating from actual relying parties) from the monitoring ones, as they do not know the source IP address of these latter (only known to our Security department). So it would be quite hard for a malicious insider (who could only be a member of Operations) to "filter" and handle differently the monitoring OCSP requests, in an attempt to fool the external monitoring agency that we have engaged.

(In reply to Adriano Santoni from comment #12)

If an insider managed to abuse one of our ICA keys to sign one or more OCSP responses, those fake responses would include the ICA certificate itself in the BasicOCSPResponse.certs field. Otherwise, a normal OCSP user agent would not be able to validate them and therefore would not fall into the trap.

I'm not sure I understand this.

The malicious insider generates a 'malicious' OCSP response (i.e. they use such an ICA key to sign a BasicOCSPResponse). They only need to exfiltrate that response, and then they can use it along with any certificate they've issued from any revoked CA.

The "server" is fully under the attacker's control, and it's easily to supply the stapled response in virtually every web server and TLS library, without having to contact your CA. They provide the malicious certificate chain (from the revoked CA), along with the malicious OCSP response they obtained.

Very little software checks the validity period of the BasicOCSPResponse. They rely on the certificate profile and the OCSP responder to not be compromised, and so will generally happily accept it for whatever validity range specified. So the malicious insider creates a validity period of, say, 5 years (because why not), for the OCSP response. Client software will happily accept that response for any time within that 5 year range.

This could even be used to bypass Certificate Transparency in enforcing clients, by using the 'revoked' CA to 'backdate' a server certificate to be issued prior to any client enforced CT. At the time CT was first introduced in a major client (Chrome), it was only a month after the transition to 825-day certificates (March 2018). So an attacker creates a certificate, says it was issued 2018-02-28, and sets it to be 39 months, and both Google and Apple clients will accept the certificate, even though it had not been deployed to CT. Further, while they do enforce upper bounds on certificate lifetimes, because of how long certificates could be valid, this "attack" will work well into Summer 2021.

It seems that the main difference in your threat model and my threat model is that you're assuming that the malicious insider cannot exfiltrate the malicious OCSP response, and instead must serve it directly from the OCSP responder for the Root CA. But I don't see any reason to believe that's the only threat, and if the malicious insider is able to simply sign that malicious 5y OCSP response, and exfiltrate it, all of your detection will fail to detect it.

Hopefully that makes more sense about why I'm failing to see how monitoring the root mitigates it? The mitigation has to be monitoring what is signed by the ICA keys. That's also why key destruction is so important: it removes the risk of them being used to sign such a response. So, for mitigations, we have to focus on how you can be sure, be sure, that the HSMs only sign Certificates and are adequately protected and audited. This is why stopping (new) issuance from these ICAs is also so important: it lets you easily validate "The HSM has not been used to sign anything" other than, perhaps, CRLs, and you can match exactly which CRL to which signature.

Does this explain it better?

I am willing to close this bug and consolidate further discussion under Actalis' bug for delayed revocation, Bug #1651651. The comments in this bug contain many valuable disclosures and observations which are preserved for cross-reference in that bug. However, before I close this bug, I would like to understand what steps Actalis takes to ensure it is following and participating in discussions that highlight these types of issues. For instance, I assume that Actalis follows and occasionally participates in discussions about the CA incidents of other CAs in Bugzilla; changes and discussions of CA/Browser Forum Guidelines, and in particular, certificate profiles; and m.d.s.p posts, which discuss developments and interpretations of the applicable standards for CAs. (Section 2.1 of the Mozilla Root Store Policy states, "CAs MUST follow and be aware of discussions in the mozilla.dev.security.policy forum, where Mozilla's root program is coordinated. They are encouraged, but not required, to contribute to those discussions.") I think that understanding steps you take to ensure this expectation is met, in order to detect and prevent future issues, is useful to ensure that no future interpretation issues arise.

(In reply to Ryan Sleevi from comment #13)

We are aware that the monitoring of our Root OSCP Responder is a control that does not cover all possible ways a malicious insider can abuse our ICA keys, that's why we have focused on the implementation and independent assessment (by our auditor) of the other two controls:

  • monitoring the signatures made by the affected ICA keys (that is, monitoring that they have not been used to sign anything other than CRLs)
  • monitoring the configuration of the HSM (containing those ICA keys), in particular with regard to audit log settings and allowed clients

(In reply to Ben Wilson from comment #14)

We agree that many comments in this bug contain many valuable disclosures and observations which are preserved for cross-reference in Bug #1651651. Actalis has always been following the discussions in Bugzilla about the CA incidents of other CAs, the changes and discussions of CA/Browser Forum Guidelines, and the M.D.S.P. discussions on developments and interpretations of the applicable standards for CAs, as well as other related lists and forums (e.g. CT-policy). We do our best to check every day, and any time we read a post that seems important to us, we bring it to the attention of our top management for a wider analysis and, where appropriate, for planning suitable actions.

Closing - please refer to Bug 1651651.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(adriano.santoni)
Resolution: --- → FIXED
Whiteboard: [ca-compliance] Next Update - 7-August 2020 → [ca-compliance]
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ocsp-failure]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: