Closed Bug 1676440 Opened 4 years ago Closed 4 years ago

NetLock: Cumulative report connected to EV verification

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: varga.viktor, Assigned: varga.viktor)

Details

(Whiteboard: [ca-compliance] [ev-misissuance])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36

Steps to reproduce:

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

On Nov 26th Mr. Ben Wilson – working on our EV validation – sent a big list of possible problems for validation. After Mr. Wilsons question were answered remained 2 problems, which should be reported as incident.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Case 1 – BR certificates must be 398 days in validity or less
2020-08-31 – In our DV system the 2 years (730 days) certificates configuration was set to 365 days. Unfortunately there was one certificate request in progress, ( quintessenz.hu )
which kept the 2 year configuration value for the CT certificate in this state.
2020-09-01 – the new rule no more than 398 days validation was lifted.
2020-09-28 16:13-14 - The customer continued its order and tried 2 times to hit the Next button. Each time only the CT certificate was issued.
2020-09-29 - more than 398 days SSL certificates disabled from code in PROD environment.

Case 2 – RSA keys must have a modulus size divisible by 8
2020-06-02 - our customer requested a DV certificate with 4092 bit. Because the used check doesn’t give error on renewal, this was not identified. (https://www.easy-shop.hu/ )
2020-10-26 - Mr. Ben Wilson kindly reported the problem.
2018-10-28 - temporary blocking of other than 2048 bits and 4096 bits key sizes.
2020-10-28 - The customer generated a new 4096 bits key, then new certificate was issued, the faulty certificate was also revoked. It’s not possible to revoke a CT certificate, so that certificate is still valid.
2020-11-10 - Finally blocking code will be published in PROD environment.

  1. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

We are not issuing certificates with these problems anymore.

  1. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

Case 1
2 CT certificates
Case 2
1 valid and 1 CT certificates

  1. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.
    Case 1
    https://crt.sh/?id=3439519295
    https://crt.sh/?id=3439519332
    Case 2
    https://crt.sh/?id=2891671522
    Valid certificate in the attachments.

  2. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
    Case 1
    The test of this combination was missed from the test cases. Because this, it was not identified until its occurrence.
    Case 2
    The key parameter check is not worked well on renew in the DV system. This key was generated long time ago and was constantly renewed. Also, the test of this combination was missed from the test cases.

  3. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

In both case new test cases and also technical controls make impossible the repeat these errors:
Case 1
2020-09-29 - more than 398 days SSL certificates disabled from code in PROD environment.
Case 2
2020-11-10 – Finally, blocking code will be published in PROD environment.

Assignee: bwilson → varga.viktor
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Summary: Cumulative report connected to EV verification → Netlock: Cumulative report connected to EV verification
Whiteboard: [ca-compliance]

Ben: Case 1 here seems to also be outlined in bug 1676367, should this bug be left to discuss case 2?

Flags: needinfo?(bwilson)

Yes.

Also, the test of this combination was missed from the test cases.

As the point of these incident reports is not merely to understand whether you've resolved the incident, but to apply "systems thinking" and look for patterns and root causes, it's clear that there's a root cause here that you failed to identify: A failure to appropriately consider tests that could prevent issues.

There are hundreds of resources available for performing root cause analysis, in myriad languages, but starting at the most very basic question:

Why weren't there tests to begin with?

Flags: needinfo?(bwilson) → needinfo?(varga.viktor)

In this case, the original test plan was based on the assumption that the data within the system was already verified data, so it could not be erroneous. This is similar to the other control problem.
As a result, the 8-divisibility check in the system did not work when renewing requests submitted before the check was deployed.
An update on 2020-11-10 fixed this bug.

For the sake of future avoidance, it has been clarified that when defining test cases, special attention should also be paid to testing data combinations that we believe are unlikely to occur.
The root reason that these test cases were missing from the test design led to these implementation errors not being detected.

Flags: needinfo?(varga.viktor)

Not the response I was hoping for, but I suspect not much more is to be expected. This was entirely preventable, and there are ample CA incidents that have demonstrated this incident in the past and, had NetLock been aware of them, prevented this (e.g. CAs in the past have forgot to test these edge cases).

However, I don't know if much more can be said here. NetLock can and should think carefully about how it follows m.d.s.p. and incidents, and how evaluates its issuance pipeline. This was entirely foreseeable, and is thus not reasonably excusable, but it seems unlikely we'll make any further progress in protecting users in this bug. Assigning to Ben to see if he has anything to add.

Flags: needinfo?(bwilson)

I have nothing to add and believe that this matter should be closed with the understanding that Netlock will work diligently to stay on top of compliance in the area of pre-issuance linting and certificate issuance. I will close this bug on or about next Wednesday, 27-Jan-2021 unless there are other items to discuss.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Summary: Netlock: Cumulative report connected to EV verification → NetLock: Cumulative report connected to EV verification
Whiteboard: [ca-compliance] → [ca-compliance] [ev-misissuance]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: