Closed Bug 1534429 Opened 1 year ago Closed 1 year ago

Camerfirma: Multicert SSL CA 001: Insufficient serial number entropy

Categories

(NSS :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ca.forum, Assigned: ca.forum)

Details

(Whiteboard: [ca-compliance])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0

Steps to reproduce:

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
On 2019-03-11 09:00 WET, after reviewing ongoing discussions and incident reports published on mozilla.dev.security.policy about 64 bit entropy for serial number generation, we started investigating our systems for possible violation of BR v.1.6.3 §7.1.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
2019-03-08 17:00 WET – identified relevant ongoing discussions on m.d.s.p and incident reports published.
2019-03-11 09:00 WET – started investigation whether the issue affected our systems.
2019-03-11 14:00 WET – conclusion of investigation is that certificates issued by our MULTICERT SSL Certification Authority 001 (MTC SSL CA 001) are affected by this issue, having only 63 bits of effective entropy. Development of fixes started immediately.
We are testing the fixes under the QA environment. Correction is planned to be deployed in production on 2019-03-12 at 13:00 WET.
We are carefully evaluating scenarios for the replacement of the certificates – in the last 4 months, all of our SSL customers have gone through at least one enforced certificate replacement (some of them have had 3 changes).

3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
Certificate issuance was stopped after concluding that we were affected by the issue and will be resumed after the fix is rolled out in production.

4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
All certificates issued from MTC SSL CA 001 (https://crt.sh/?caid=84368) are affected by this issue. There are currently a total of 924 active non expired certificates, issued between 2018-10-17 15:12 WET and 2019-03-11 12:45 WET.

5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
We are attaching a CSV file to this report with the affected certificates.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
The issue is due to a flaw in the serial number generation algorithm that changes the leftmost bit to 0 to force the serial number to be positive.
The issue is undetectable by lint tools and can only be found by source code inspection or statistical tests (over a large population).

7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
Serial numbers size will be increased to include a minimum of 120 bits of entropy.

Assignee: wthayer → ca.forum
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Summary: Multicert SSL CA 001: Insufficient serial number entropy → Camerfirma: Multicert SSL CA 001: Insufficient serial number entropy
Whiteboard: [ca-compliance]

(In reply to ca.forum from comment #0)

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
2019-03-08 17:00 WET – identified relevant ongoing discussions on m.d.s.p and incident reports published.
2019-03-11 09:00 WET – started investigation whether the issue affected our systems.
2019-03-11 14:00 WET – conclusion of investigation is that certificates issued by our MULTICERT SSL Certification Authority 001 (MTC SSL CA 001) are affected by this issue, having only 63 bits of effective entropy. Development of fixes started immediately.
We are testing the fixes under the QA environment. Correction is planned to be deployed in production on 2019-03-12 at 13:00 WET.

2019-03-12 13:09 WET - the fix was deployed in production and certificate issuance was resumed.

We are carefully evaluating scenarios for the replacement of the certificates – in the last 4 months, all of our SSL customers have gone through at least one enforced certificate replacement (some of them have had 3 changes).

We are carefully evaluating scenarios for the replacement of the certificates – in the last 4 months, all of our SSL customers have gone through at least one enforced certificate replacement (some of them have had 3 changes).

Does this mean the certificates were not revoked? Judging by examples - https://crt.sh/?id=932614195 - it appears not to be. https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation details the expectations and policy around revocation. Is it a correct statement to state that Camerfirma/Multicert plans to not revoke these certificates? I just want to make sure I have a clear understanding of the response, for future reference.

Flags: needinfo?(ca.forum)

Here follows an update:

2019-03-20: 21 certificates revoked
2019-03-22: 13 certificates revoked
2019-06-19: 48 certificates revoked

We have now 732 active certificates (110 have expired meanwhile).

The plan for certificate replacement is the following:

  1. There are 46 certificates that will expire in the next 60 days. Those certificates are already (or soon will be) in the renewal period window and thus we believe it would be misleading for the customer to receive parallel communications of both renewal and certificate replacement. Therefore, we intend to skip those certificates.
  2. For the remaining 686, we will be sending replacement notifications by July 1st and monitoring the replacement process. In case the certificates are not replaced in due time, we will be contacting the customers directly with the following priority:
    1. “Large accounts” (at our scale) – the assumption is that we will be able to replace a large number of certificates with a fewer contacts
    2. Certificates with longer expiry date first – certificates with closer expiry dates will eventually enter the renewal window, therefore we intend to concentrate efforts on certificates with longer validity.
Flags: needinfo?(ca.forum)

So there was zero progress for three months? That is, from a CA, patently unacceptable and negligent. I must emphasize that the lack of response, combined with the failure to proactively take steps to ensure timely revocation, suggest a CA that is neither willing nor able to abide by the Baseline Requirements. A failure to contact the customer is not an acceptable reason to prolong revocation, and certainly not at this scale.

Similarly, I take this response to mean that Multicert will ignore the BRs for up to 60 days (or longer), when convenient.

I cannot emphasize enough how poorly this incident report reflects upon Camerfirma and Multicert, when the entire point is to allow the CAs involved to demonstrate how they are beyond reproach and how they can effectively manage incidents of all scales with urgency, professionalism, and a multi disciplined approach to ensure any issues that may reasonably cause delays are systemically addressed going forward.

As a sub-CA, these responses equally reflect on the parent CA, and similarly affect it’s trust and reputation going forward. I want to encourage you to revisit this report, using the principles outlined in Responding to an Incident Report, and make sure you are satisfied that you have addressed them. While Mozilla cannot force revocation on a technical level, and similarly, cannot grant exceptions to the BRs, it absolutely will maintain a public record of how well a CA handles itself in the face of an incident.

Flags: needinfo?(ca.forum)
Flags: needinfo?(eusebio.herrera)

As we are writing, Multicert is issuing the new certificates (with a minimum of 120 bits of entropy) to replace all the ones with the situation mentioned and by July 1st 2019 all of them will either have been replaced with new ones (entropy >= 120 bits) or revoked.

We were giving more time to our remaining clients to replace the certificates but in light of the recent discussions, we are accelerating our plan and by July 1st will be done entirely.

Flags: needinfo?(ca.forum)
Whiteboard: [ca-compliance] → [ca-compliance] - Next Update - 01-July 2019

We developed a scanning tool to monitor progress on certificate substitution. For each affected certificate it generates 3 possible results:

  • REPLACED - the certificate presented in the TLS connection is not the affected certificate. We assume the customer has already replaced the certificate.
  • STILL SAME CERT - the certificate presented in the TLS connection is still the affected certificate. We assume the customer has not yet replaced the certificate.
  • UNREACHABLE - no TLS connection could be established (possible reasons: domain not publicly accessible, server shutdown, temporary network failure, etc). We can't check by automated means whether the certificate has been replaced.

Here is the last status of the day:

  • REPLACED: 400
  • STILL SAME CERT: 143
  • UNREACHABLE: 173 (although this value should be stable, we've been observing values floating between 171-173)

Although the certificates with status REPLACED can presumably be revoked, we plan to revoke them all at once July 1st. This is to avoid any unexpected outage over the weekend in case the scanning tool has some nasty bug.

Progress update:

  • REPLACED: 435
  • STILL CAM CERT: 110
  • UNREACHABLE: 171

While operations are ongoing, here follows a description of the context and rationale of our approach:

  1. We analyzed from the very beginning this issue when we got aware, soon after it was announced and started to be discussed publicly at m.d.s.p. We immediately took action and promptly reported the incident on March 11th (we were the 9th CA out of 26 to report a similar issue). This report was before a statistical analysis of serial numbers per CA was published on March 14th (an hint about this statistical analysis was already referred on our incident report («The issue is undetectable by lint tools and can only be found by source code inspection or statistical tests (over a large population).»).
  2. On 12th March we changed our algorithm and started issuing certificates with 120 bits minimum entropy. Between the time we confirmed we were affected by the issue and the moment we patched our systems, certificate issuance was halted.
  3. In late November 2018 we forced a general certificate substitution of certificates to our SSL customers due to a change on the certificate hierarchy. Additionally, some more substitutions were required due to misissuances, which forced customers to replace more than once the certificates in a very short time frame. The feedback was obviously negative and the operational (and commercial) impact was significant.
  4. Therefore, when in March we understood that a new general certificate replacement would had to be rolled out, the prospects were not pleasant. We deeply analyzed the situation:
    1. From our security evaluation and the discussion in m.d.s.p, it was clear this was more of a standards compliance issue rather than an actual exploitable security vulnerability (there are no known pre-image attacks on SHA-256 and it is not foreseen likely in the next 825 days (ETSI TS 119 312 v1.3.1 §8.3 expects SHA-256 to be suitable for the next 6 years).
    2. We replaced all of the certificates from systems under our control in the next following days.
    3. From the experience of the previous general certificate replacement, we wanted to give some time to our customers. There are several certificates protecting critical systems in the areas of healthcare, banking, government. Even if the systems are critical, the hard truth is that the level of readiness and maturity for such a change in short time is not there yet.
    4. After the previous traumatic process in November 2018, a 2nd almost consecutive general certificate presented challenging (to say the least).

With this context, our original plan was to give our customers a little relief from the general replacement in November 2018 and perform a progressive rollout.

However, in the light of the feedback and necessity to abide with the requirements, it was decided to fast track the process and set July 1st as the revocation date for all certificates with this issue. On June 27th we reissued all (but one*) of the affected certificates, sent out a notification message and started one-to-one communications with customers.

In these last days we have already compiled some lessons learned and took the occasion to help and instruct our customers:

  1. Some customers in the Banking sector are doing certificate pinning on the app (rather than key pinning). They claim they won’t be able to release a new app version in due time (further to the changes and repacking, there is also the time taken for the app store programs to review the release, which may take up to 1 week in some platforms). Also, the pace at which end users update apps is out of the control of these Banks. We have already instructed them to move from certificate pinning to key pinning and will be working with them in the next days about ways to have multiple key pinning to better cope with future cases that may require change of keys. We understand that for these customers there are several steps in the procedure that are out of their direct control.
  2. A large number of our customers still enter an email of an individual in the technical contact details. We are instructing customers to enter internal mailing list/ticket management addresses to avoid problems with people leaving the organization and/or absent staff
  3. One customer has a few dozens of certificates and can not quickly substitute all of them. We are evaluating solutions for certificate automation to recommend to our customers and possibly integrate with our systems.
  4. *One customer added a CAA record for other CA on the top domain. This is preventing us to reissue the certificate. We are working with the customer for him/her to adjust the DNS settings.

New update after the 12:00 WEST deadline communicated to customers:

  • REPLACED: 474
  • STILL SAME CERT: 67
  • UNREACHABLE: 175

Revocation is planned for tonight 23:30 WEST.

Meanwhile, we have 2 particular cases of Banks that are doing certificate pinning (instead of key pinning). After continuous dialogue with them, we find that they did all they could to accelerate the process (new app versions were submitted for review Friday and Saturday evening). Further to the time needed for the review, the pace of update by end users is typically slow. In further releases the apps will be changed to use key pinning instead of certificate pinning. Theferore, we intend to hold the revocations of the following certificates until 2019-07-08 12:00 WEST:

Batch revocation completed.

From the 713 certificates:

  • 705 certificates revoked
  • 7 certificates expired meanwhile
  • 1 further exception granted for https://crt.sh/?id=968326363. The customer could not update the CAA record, which is preventing us to reissue the certificate. We will be closely following the case and supporting him to hopefully revoke the certificate July 2nd.

To make sure: All outstanding certificates have been revoked? Can you confirm that the certificate mentioned in Comment #10 was revoked?

(In reply to Ryan Sleevi from comment #11)

To make sure: All outstanding certificates have been revoked? Can you confirm that the certificate mentioned in Comment #10 was revoked?

Yes, all outstanding certificates were revoked.

Certificate from comment #10 was revoked on July 3rd and the 3 certificates from comment #9 were revoked on July 8th around 17:00 WEST (since those certificates were related to mobile banking apps, revocation was scheduled to be after the daily limit at which money transfer orders are executed in the next business day).

Moving forward, and concerning customers using pinning to protect their systems, I believe the fullfillment of the 5-day window is currently challenging for the following reasons:

  1. new app releases depend on the review processes of app stores, which may take up to 1 week
  2. most popular pinning SDKs for apps are only support certificate pinning, not public key pinning. In that sense, new feature requests were submitted on the following SDKs:
    1. Cordova Advanced HTTP
    2. PhoneGap SSL Certificate Checker plugin

Thanks. I do want to draw attention to the fact that public key pinning is also fraught with danger, in that public keys themselves may need to be rotated or revoked (e.g. in the event of key compromise). Fundamentally, pinning, to certificates or keys, is dangerous for risks such as this; as one of the co-authors of the public key pinning spec, this is a reason why we've since heavily discouraged it.

Has Camerfirma/Multicert considered actively discouraging pinning, by explaining to their customers the risks that may be involved if

  • The leaf certificate has to be revoked
  • The intermediate certificate has to be revoked
  • The leaf key is compromised
  • The CA needs to rotate its root key / stand up a new root?

And reaffirmed to its Customers/Subscribers that the CA is obligated to revoke within 24 hours to 5 days in the event of a misissuance or incident?

While I look forward to your answers to these questions, also settings N-I for Wayne, as I believe this represents the completion of the primary remediation steps, and now we're discussing opportunities for improved mitigations going forward.

Flags: needinfo?(wthayer)

(In reply to Ryan Sleevi from comment #13)

Thanks. I do want to draw attention to the fact that public key pinning is also fraught with danger, in that public keys themselves may need to be rotated or revoked (e.g. in the event of key compromise). Fundamentally, pinning, to certificates or keys, is dangerous for risks such as this; as one of the co-authors of the public key pinning spec, this is a reason why we've since heavily discouraged it.

Has Camerfirma/Multicert considered actively discouraging pinning, by explaining to their customers the risks that may be involved if

  • The leaf certificate has to be revoked
  • The intermediate certificate has to be revoked
  • The leaf key is compromised
  • The CA needs to rotate its root key / stand up a new root?

Thanks for raising these risks. During our contacts with customers, we have spotted some of them as well. Nevertheless, I believe we can think of ways to mitigate all but the 3rd risk, if doing key pinning instead of certificate pinning.

I am not a strong supporter of key pinning either, but I do recognize its merits and the appeal to be used in advanced threat models (the kinds of mobile banking, sensitive personal information, etc).

To the best of my knowledge, there is currently no alternative as powerful as key pinning. The technique is there and many SDKs are available, including native support in Android through the Network Security Configuration.

Therefore, if BR-compliant publicly trusted certificates present challenges to key pinning, my guess is that it will soon start to be done bypassing revocation checks or with private certificates (homemade, self-signed, etc). Does this lead to a safer Internet? (Disclaimer: there would be of course also a loss of market for CAs, but honestly speaking that is the least of my concerns here).

Before going out to customers advocating against key pinning, I propose this discussion to be moved to m.d.s.p.

And reaffirmed to its Customers/Subscribers that the CA is obligated to revoke within 24 hours to 5 days in the event of a misissuance or incident?

That is what they heard the most in the past days, and we will be communicating it increasingly clearer and louder. Hopefully the whole community will make it jointly - to be effective, it has to be a collective effort.

While I look forward to your answers to these questions, also settings N-I for Wayne, as I believe this represents the completion of the primary remediation steps, and now we're discussing opportunities for improved mitigations going forward.

To capture where we stand:
Comment #4 captures a request to Camerfirma regarding their supervision of Multicert
Comment #13 captures the request to Wayne regarding his review of this incident report
Comment #14 captures that Multicert wants to discuss key pinning best practices on m.d.s.p. for how best to advise their customers

It appears that all questions have been answered and remediation is complete.

Comment #14 captures that Multicert wants to discuss key pinning best practices on m.d.s.p. for how best to advise their customers

I am not aware of Multicert having done so, but I encourage Multicert or Camerfirma to ask this question on m.d.s.p.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(wthayer)
Flags: needinfo?(eusebio.herrera)
Resolution: --- → FIXED
Whiteboard: [ca-compliance] - Next Update - 01-July 2019 → [ca-compliance]

Yesterday Multicert created a new topic in m.d.s.p.
https://groups.google.com/forum/#!forum/mozilla.dev.security.policy

You need to log in before you can comment on or make changes to this bug.