Closed Bug 1538673 Opened 5 years ago Closed 5 years ago

Consorci AOC: EC-SECTORPUBLIC insufficient serial number entropy

Categories

(CA Program :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fferre, Assigned: fferre)

Details

(Whiteboard: [ca-compliance] [ov-misissuance] [ev-misissuance])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36

Steps to reproduce:

Issue SSL certificates with insufficient serial number entropy

Actual results:

Issued certificates had 63 bits instead of 64 bits of entropy

Expected results:

Issued certificates sholud have had at least 64 bits

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

On 2019-03-22 15:00 CET, after following closely ongoing discussions and incident reports published on mozilla.dev.security.policy about 64 bit entropy for serial number generation, we started investigating our systems for possible violation of BR v.1.6.3 §7.1.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2019-03-22 16:00 CET – identified that our Systems were affected. Certificates issued by Consorci AOC's "CN=EC-SectorPublic" are affected. Preventive stop of certificate issuance
2019-03-22 16:00 CET – started investigation on feasible solutions.

We will begin to evaluate scenarios for the eventual replacement of the certificates during this week.

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

We preventively stopped the issuance of SSL certificates on 22-March-2019 16:00.
We will update today our systems in TEST environment to uses 128 bits serial numbers.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

All SSL certificates issued. 2.258 are still valid.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

https://crt.sh/?Identity=%25&iCAID=8050

2.258 are still valid.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We used the by default serial numbers generation implemented by EJBCA, and we were sure that we were fulfilling requirements, such that any CA using EJBCA with the default settings woud encounter this issue (and be therefore in violation of BR 7.1).

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Configure our systems in TEST to issue 128 bit serial number certificates.
Tests during this week.
We will update this ticket to inform when we will turn to PRODUCTION and re-start the issuance.

Assignee: wthayer → fferre
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(fferre)
Whiteboard: [ca-compliance]

(In reply to Francesc Ferrer from comment #1)

On 2019-03-22 15:00 CET, after following closely ongoing discussions and incident reports published on mozilla.dev.security.policy about 64 bit entropy for serial number generation, we started investigating our systems for possible violation of BR v.1.6.3 §7.1.

This is a substantial delay from when the issue was first discussed, and certainly well-after a number of incident reports were provided.

Given that all CAs participating in the Mozilla program are required to monitor mozilla.dev.security.policy and be aware of these discussions, please both explain why Consorci AOC failed to do so in a timely manner, and what steps are being taken to ensure timely awareness and investigation going forward.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

In considering your responses to this question, please conduct a more thorough analysis about the examination Consorci AOC made about EJBCA's default configuration, what other examinations have been performed regarding EJBCA's configuration, and going forward, how Consorci AOC will ensure on a timely basis that their configuration settings are appropriate and match the expectations of the Baseline Requirements?

(In reply to Ryan Sleevi from comment #2)

(In reply to Francesc Ferrer from comment #1)

On 2019-03-22 15:00 CET, after following closely ongoing discussions and incident reports published on mozilla.dev.security.policy about 64 bit entropy for serial number generation, we started investigating our systems for possible violation of BR v.1.6.3 §7.1.

This is a substantial delay from when the issue was first discussed, and certainly well-after a number of incident reports were provided.

Given that all CAs participating in the Mozilla program are required to monitor mozilla.dev.security.policy and be aware of these discussions, please both explain why Consorci AOC failed to do so in a timely manner, and what steps are being taken to ensure timely awareness and investigation going forward.

From our point of view, the issue is, perhaps, the most controversial one regarding BR compliance and there is not yet consensus that 63 bits of entropy is a security issue. We honestly think that in fact this is not a security matter, nor an interoperability issue, but a compliance one.

After a first look into our certificates database, we realized that ALL of them were affected.

The certificates issued by Consorci AOC are either OV or EV, and, therefore, extra verifications are in place before issuance compared to DV certificates.

Impact on customers was also assessed. We had also to check how a longer serial number would affect our client's systems.

We are serving to the Catalan Public Sector and Government and it was hard to suddenly stop issuing certificates that, still do not represent a security issue for our clients.

Despite all of the above, our steering committee at last decided to align again with BR ASAP. Please find below the actions already taken and those to be taken below.

Therefore, we consider that Consorci AOC didn't fail but decided to wait for a consensus that did not arrive while studying the impact and steps to be taken.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

In considering your responses to this question, please conduct a more thorough analysis about the examination Consorci AOC made about EJBCA's default configuration, what other examinations have been performed regarding EJBCA's configuration, and going forward, how Consorci AOC will ensure on a timely basis that their configuration settings are appropriate and match the expectations of the Baseline Requirements?

We checked the by-default EJBCA serial number settings to be sure that we were using the by-default settings.
We checked our TEST and PRODUCTION environments to discover that we were using the by-default setting,
We've changed our TEST environment updating to the latest branch 6 version of EJBCA to be able to configure a longer serial number setting in a per-CA basis.
We plan to update the PRODUCTION environment tomorrow and resume the issuance.
We've configure serial number length of 128 bits, 127 bits entropy, exceeding by far the BR requirements

Flags: needinfo?(fferre)

(In reply to Francesc Ferrer from comment #3)

(In reply to Ryan Sleevi from comment #2)

(In reply to Francesc Ferrer from comment #1)

On 2019-03-22 15:00 CET, after following closely ongoing discussions and incident reports published on mozilla.dev.security.policy about 64 bit entropy for serial number generation, we started investigating our systems for possible violation of BR v.1.6.3 §7.1.

This is a substantial delay from when the issue was first discussed, and certainly well-after a number of incident reports were provided.

Given that all CAs participating in the Mozilla program are required to monitor mozilla.dev.security.policy and be aware of these discussions, please both explain why Consorci AOC failed to do so in a timely manner, and what steps are being taken to ensure timely awareness and investigation going forward.
<snip>
Therefore, we consider that Consorci AOC didn't fail but decided to wait for a consensus that did not arrive while studying the impact and steps to be taken.

I do not consider this to be an acceptable answer to the questions posed, nor does it work to instill confidence in Consorci AOC's incident handling and responsiveness. Neither the Baseline Requirements nor Mozilla Policy permit CAs to make discrimination of their incident response based on their assessment of security versus compliance. In particular, from when this issue was first reported [1], and from when the first incident reports began being reported [2], to the dedicated thread about this matter [3], Consorci AOC should have been examining the systems and similarly providing incident reports.

I again repeat:

  1. Please explain why Consorci AOC failed to monitor and respond in a timely manner
  2. What steps are being taken to ensure timely awareness and investigation going forward

Regardless of Consorci AOC's views on the severity of this, the significant delay in disclosing and reporting represents a serious concern regarding the ongoing operations, as does the lack of a clear and concrete timeline. The incident analysis needs to consider how Consorci AOC can improve in both of these regards going forward, and thus requires further introspection and analysis than presently demonstrated. We need to be confident that a similar misunderstanding will not occur in the future, and thus understanding what steps are being taken to prevent that, and to engage in a timely fashion, are necessary to resolve this issue. Consorci AOC is not the only CA being called out for the delay in response and reporting.

[1] https://groups.google.com/d/msg/mozilla.dev.security.policy/nnLVNfqgz7g/u1-0eQ2yAAAJ / https://groups.google.com/d/msg/mozilla.dev.security.policy/nnLVNfqgz7g/4s26CTfOBQAJ
[2] https://groups.google.com/d/msg/mozilla.dev.security.policy/-RB8ovYgOHE/HeciPTGGAQAJ
[3] https://groups.google.com/d/msg/mozilla.dev.security.policy/nlN_QrDwgaw/cg_v-VY0AQAJ

Flags: needinfo?(fferre)

(In reply to Ryan Sleevi from comment #4)

(In reply to Francesc Ferrer from comment #3)

(In reply to Ryan Sleevi from comment #2)

(In reply to Francesc Ferrer from comment #1)

On 2019-03-22 15:00 CET, after following closely ongoing discussions and incident reports published on mozilla.dev.security.policy about 64 bit entropy for serial number generation, we started investigating our systems for possible violation of BR v.1.6.3 §7.1.

This is a substantial delay from when the issue was first discussed, and certainly well-after a number of incident reports were provided.

Given that all CAs participating in the Mozilla program are required to monitor mozilla.dev.security.policy and be aware of these discussions, please both explain why Consorci AOC failed to do so in a timely manner, and what steps are being taken to ensure timely awareness and investigation going forward.
<snip>
Therefore, we consider that Consorci AOC didn't fail but decided to wait for a consensus that did not arrive while studying the impact and steps to be taken.

I do not consider this to be an acceptable answer to the questions posed, nor does it work to instill confidence in Consorci AOC's incident handling and responsiveness. Neither the Baseline Requirements nor Mozilla Policy permit CAs to make discrimination of their incident response based on their assessment of security versus compliance. In particular, from when this issue was first reported [1], and from when the first incident reports began being reported [2], to the dedicated thread about this matter [3], Consorci AOC should have been examining the systems and similarly providing incident reports.

I again repeat:

  1. Please explain why Consorci AOC failed to monitor and respond in a timely manner
  2. What steps are being taken to ensure timely awareness and investigation going forward

Regardless of Consorci AOC's views on the severity of this, the significant delay in disclosing and reporting represents a serious concern regarding the ongoing operations, as does the lack of a clear and concrete timeline. The incident analysis needs to consider how Consorci AOC can improve in both of these regards going forward, and thus requires further introspection and analysis than presently demonstrated. We need to be confident that a similar misunderstanding will not occur in the future, and thus understanding what steps are being taken to prevent that, and to engage in a timely fashion, are necessary to resolve this issue. Consorci AOC is not the only CA being called out for the delay in response and reporting.

[1] https://groups.google.com/d/msg/mozilla.dev.security.policy/nnLVNfqgz7g/u1-0eQ2yAAAJ / https://groups.google.com/d/msg/mozilla.dev.security.policy/nnLVNfqgz7g/4s26CTfOBQAJ
[2] https://groups.google.com/d/msg/mozilla.dev.security.policy/-RB8ovYgOHE/HeciPTGGAQAJ
[3] https://groups.google.com/d/msg/mozilla.dev.security.policy/nlN_QrDwgaw/cg_v-VY0AQAJ

Dear Ryan,

Let us be more specific.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

On 2019-03-15 we first became aware via m.d.s.p. group.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

On 2019-03-15 we were aware of the issue and we decided to keep issuing certificates while the discussion on the severity was going still on.
On 2019-03-22 15:00 CET, a week after following closely ongoing discussions and incident reports published on mozilla.dev.security.policy about 64 bit entropy for serial number generation, we started investigating our systems for possible violation of BR v.1.6.3 §7.1., taking into account that the severity was accepted by the community.
On 2019-03-22 16:00 CET – identified that our Systems were affected. Certificates issued by Consorci AOC's "CN=EC-SectorPublic" are affected. Preventive stop of certificate issuance
On 2019-03-22 16:00 CET - we stopped accepting new issuance requests, but still there was a request accepted and the client downloaded the certificate the 26th (https://crt.sh/?id=1318507258). This is the last certificate issued with a 64 bit serial number.
On 2019-03-22 16:00 CET – started investigation on feasible solutions.
On 2019-03-25 09:00 CET - the board decided to increase sn length to 128 bit
On 2019-03-26 09:00 CET - we deployed the changes on our PRE-production environment
On 2019-03-27 14:00 CET - we updated the PRODUCTION environment and resume the issuance of certificates

Regarding your questions:

  1. Please explain why Consorci AOC failed to monitor and respond in a timely manner
    The lack of people following the m.d.s.p. group.
  2. What steps are being taken to ensure timely awareness and investigation going forward
    We will add two more people, Ms. Anna Giné and Mr. Xavier Llebaria to the m.d.s.p. group and CCADB as PoC.

Thank you,

Flags: needinfo?(fferre)

Other CAs have reported receiving notice from EJBCA on March 3 (e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1542302#c0 )

Did you receive such notice? If not, could you explain why not? If you did, could you explain why it took another 11 days?

In terms of understanding root causes and failures, I don't think we've clearly identified the root causes yet. As noted previously, all CAs participating in Mozilla's program were expected to monitor m.d.s.p., and acknowledged it as such. Understanding why there was a lack of people following the m.d.s.p. group, in spite of that, is important to understanding how future issues will not happen. Please help explain why there was a lack of such monitoring, given past notifications and commitments.

Flags: needinfo?(fferre)

(In reply to Ryan Sleevi from comment #6)

Other CAs have reported receiving notice from EJBCA on March 3 (e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1542302#c0 )

Did you receive such notice? If not, could you explain why not? If you did, could you explain why it took another 11 days?

First of all, thanks for the advice of such notifications. In our case, such notice never arrived from PrimeKey. We have contacted PRIMEKEY-EJBCA for receiving them as part of the Enterprise Support licensing we pay for.

In terms of understanding root causes and failures, I don't think we've clearly identified the root causes yet. As noted previously, all CAs participating in Mozilla's program were expected to monitor m.d.s.p., and acknowledged it as such. Understanding why there was a lack of people following the m.d.s.p. group, in spite of that, is important to understanding how future issues will not happen. Please help explain why there was a lack of such monitoring, given past notifications and commitments.

Consorci AOC considers two root causes in this case: lack of people monitoring the list and the incorrect incident management procedure performed in this case. Resources are always limited and, from our point of view, having a person monitoring the m.d.s.p. seemed to be enough. Also evaluation and remediation were performed before incident reporting.

Related to the two root causes identified, these are the improvements that Consorci AOC has put into place in order to prevent such issues to happen again:

  • Improve detection : we have increased the resources dedicated to mdsp and related incident monitoring. Another measure is the subscription to critical software vendors support channels like EJBCA (as stated above)
  • Incident management procedure enforcement : enforcement of the applicable incident management procedure in order to register the incident before the evaluation and contention stages.
Flags: needinfo?(fferre)

It appears that all remediation has been completed.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ov-misissuance] [ev-misissuance]
You need to log in before you can comment on or make changes to this bug.