Closed Bug 1705832 Opened 3 years ago Closed 3 years ago

KIR S.A.: DV certificates with locality name, organization name and stateOrProvinceName

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: michel, Assigned: piotr.grabowski)

Details

(Whiteboard: [ca-compliance] [dv-misissuance])

Hello,
I found 2 certificates with the 2.23.140.1.2.1 policy, but with organizationName, localityName and stateOrProvinceName:
https://crt.sh/?id=3391273980&opt=zlint,ocsp (CRL Revoked, OCSP Unknown)
https://crt.sh/?id=3391286036&opt=zlint,ocsp (CRL Revoked, OCSP Revoked)

Assignee: bwilson → piotr.grabowski
Status: NEW → ASSIGNED
Whiteboard: [ca-compliance]

Hello Micheal,

https://crt.sh/?id=3391273980&opt=zlint,ocsp has OCSP status revoked.

Do we have to report an incident here?
Do issue occured during applying new OIDs for compliance reasons and before compliance date 2020-09-30?

2020‐09‐30 7.1.6.4 Subscriber Certificates MUST include a CA/Browser Form Reserved Policy Identifier in the Certificate Policies extension

Hi Piotr,

I'm afraid you've misunderstood the incident, which is rather deeply troubling.

You included an DV OID with the OV fields. The 2020-09-30 date is irrelevant, you should re-read the BRs.

Flags: needinfo?(piotr.grabowski)

Hi Ryan,
I will report an incident for that issue.
As a quick summary we have all
technical controlls in place
They were deployed just after issue occured.
I am fully aware of the problem.

Flags: needinfo?(piotr.grabowski)

Can you also explain why KIR S.A. failed to provide an incident report on this when they revoked the certificates on 2020-09-17? As per Mozilla Root Store Policy:

When a CA fails to comply with any requirement of this policy - whether it be a misissuance, a procedural or operational issue, or any other variety of non-compliance - the event is classified as an incident. At a minimum, CAs MUST promptly report all incidents to Mozilla in the form of an Incident Report, and MUST regularly update the Incident Report until the corresponding bug is marked as resolved in the mozilla.org Bugzilla system by a Mozilla representative. CAs SHOULD cease issuance until the problem has been prevented from reoccurring.

This doesn't seem to be "prompt" and a third party had to report it to Mozilla before you provided an incident report.

Bug report:

  1.   How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date. 
    

KIR became aware of this by verifying the certificate content immediately after issuance but
because issue occured during applying new OIDs for compliance reasons and before compliance date 2020-09-30 we didn't classify this as an incident which is regrettable.

  1.   A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular 
    

requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2020-09-17 07:01:57 UTC and 2020-09-17 07:04:57 UTC - 2 Certficates were issued.
2020-09-17 07:05:57 UTC We investigation the root cause began.
2020-09-17 07:19:37 UTC The certificates was revoked.
2020-09-17 07:30:37 UTC We identified the registration policy that issued the problematic certificates.
2020-09-17 08:30:00 UTC Interal annlysis finished with conclusion that we applied DV oid to the one of the policy with OV fields.
2020-09-17 12:30:00 UTC Error was fixed in the registration policy.

  1.  Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation. 
    

We stopped issuing certificates with the problem immediately after issuance by fixing the registration policy.

  1.  A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued. 
    

https://crt.sh/?id=3391273980&opt=zlint,ocsp
https://crt.sh/?id=3391286036&opt=zlint,ocsp

  1.  The complete certificate data for the problematic certificates. 
    
     https://crt.sh/?id=3391273980&opt=zlint,ocsp 
    

    https://crt.sh/?id=3391286036&opt=zlint,ocsp

  2.  Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now. 
    

We accidentally put wrong (DV) oid to the one of the policy with OV fields while making changes in registration policy for compliance reasons.

  1. List of steps CA is taking to resolve the situation and ensure it will not be repeated.

The issue was immediately fixed and can not happen again.

(In reply to Piotr Grabowski from comment #5)

  1.  Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now. 
    

We accidentally put wrong (DV) oid to the one of the policy with OV fields while making changes in registration policy for compliance reasons.

  1. List of steps CA is taking to resolve the situation and ensure it will not be repeated.

The issue was immediately fixed and can not happen again.

Please redo this report. This is a completely unacceptable level of detail. "We made an accident, we promise not to to do it again" would not have been appropriate in 2014. It's wildly inappropriate in 2021.

Before doing so, please read https://wiki.mozilla.org/CA/Responding_To_An_Incident, and in particular

The purpose of these incident reports is to provide transparency about the steps the CA is taking to address the immediate issue and prevent future issues, both the issue that originally lead to the report, and other potential issues that might share a similar root cause. Additionally, they exist to help the CA community as a whole learn from potential incidents, and adopt and improve practices and controls, to better protect all CAs. Mozilla expects that the incident reports provide sufficient detail about the root cause, and the remediation, that would allow other CAs or members of the public to implement an equivalent solution.

For example, it’s not sufficient to say that “human error” of “lack of training” was a root cause for the incident, nor that “training has been improved” as a solution. While a lack of training may have contributed to the issue, it’s also possible that error-prone tools or practices were required, and making those tools less reliant on training is the correct solution. When training or a process is improved, the CA is expected to provide specific details about the original and corrected material, and specifically detail the changes that were made, and how they tie to the issue. Training alone should not be seen as a sufficient mitigation, and focus should be made on removing error-prone manual steps from the system entirely.

Flags: needinfo?(piotr.grabowski)

(In reply to Piotr Grabowski from comment #5)

2020-09-17 07:05:57 UTC We investigation the root cause began.
2020-09-17 07:30:37 UTC We identified the registration policy that issued the problematic certificates.

These time-stamps are really accurate, given you did not submit an incident report at that time. How were these time-stamps recorded?

Also, https://crt.sh/?id=3391273980&opt=zlint,ocsp was revoked at 2020-09-17T07:03:55Z, which would be before your investigation started. How did this happen?

And lastly, as shown in comment 4, not providing prompt incident reports is an incident in itself. Where is the incident report for this incident? Why do you think it (non-prompt incident reports) cannot happen again without even reporting and analyzing it?

I agree with Ryan's & Paul's comments, KIR S.A. should both rewrite this incident report with more detail and then open a new bug and provide an incident report on why it revoked the certificate, but then didn't consider it an incident which it should report to Mozilla.

ok, I will provide more detailed incident report.

Flags: needinfo?(piotr.grabowski)

(In reply to paul.leo.steinberg from comment #7)

(In reply to Piotr Grabowski from comment #5)

2020-09-17 07:05:57 UTC We investigation the root cause began.
2020-09-17 07:30:37 UTC We identified the registration policy that issued the problematic certificates.

These time-stamps are really accurate, given you did not submit an incident report at that time. How were these time-stamps recorded?

Also, https://crt.sh/?id=3391273980&opt=zlint,ocsp was revoked at 2020-09-17T07:03:55Z, which would be before your investigation started. How did this happen?

And lastly, as shown in comment 4, not providing prompt incident reports is an incident in itself. Where is the incident report for this incident? Why do you think it (non-prompt incident reports) cannot happen again without even reporting and analyzing it?

The timestamps are as accurate as facts and timeline provided. After issuance (CT log operator timestamp) we did post-linting and immediately revoked these 2 certificates (CRL/OCSP revocation dates). We reacted so promptly that these timestamps are very close.

2020-09-17 07:19:37 UTC The certificates was revoked.

The certificate is probably https://crt.sh/?id=3391286036&opt=zlint,ocsp

Also, https://crt.sh/?id=3391273980&opt=zlint,ocsp was revoked at 2020-09-17T07:03:55Z, which would be before your investigation started. How did this happen?

I don't see anything strange here. It seems that they:

  1. Issued https://crt.sh/?id=3391273980&opt=zlint,ocsp
  2. Noticed the issue in https://crt.sh/?id=3391273980&opt=zlint,ocsp
  3. Revoked https://crt.sh/?id=3391273980&opt=zlint,ocsp
  4. Started the investigation
  5. Found https://crt.sh/?id=3391286036&opt=zlint,ocsp and revoked it

Bug report:

  1.   How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date. 
    

KIR became aware of this by verifying the certificate content immediately after issuance but
because issue occured during applying new OIDs for compliance reasons and before compliance date 2020-09-30 we didn't classify this as an incident which is regrettable.

  1.   A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular 
    

requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2020-09-17 07:01:57 UTC and 2020-09-17 07:04:57 UTC - 2 Certficates were issued.
2020-09-17 07:05:57 UTC We investigation the root cause began.
2020-09-17 07:19:37 UTC The certificates was revoked.
2020-09-17 07:30:37 UTC We identified the registration policy that issued the problematic certificates.
2020-09-17 08:30:00 UTC Interal annlysis finished with conclusion that we applied DV oid to the one of the policy with OV fields.
2020-09-17 12:30:00 UTC Error was fixed in the registration policy.

  1.  Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation. 
    

We stopped issuing certificates with the problem immediately after issuance by fixing the registration policy.

  1.  A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued. 
    

https://crt.sh/?id=3391273980&opt=zlint,ocsp
https://crt.sh/?id=3391286036&opt=zlint,ocsp

  1.  The complete certificate data for the problematic certificates. 
    
     https://crt.sh/?id=3391273980&opt=zlint,ocsp 
    

    https://crt.sh/?id=3391286036&opt=zlint,ocsp

  2.  Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now. 
    

We accidentally put wrong (DV) oid to the one of the policy with OV fields while making changes in registration policy for compliance reasons.
The technical reason was wrong (positive-false) report from our testing tool. In the test environment pre-linting was not executed for every certificate issued for testing purposes. We feed our testing tool with series of certifcates and then execute pre-lining. 2 of 20 certificates generated for testing purposes by was not taken into account. 1 of 2 was problematic. Probably we had some network issue that prevented them to be uploaded to the dedicated directory but this a secondary thing. The main reason of the issue was related to the fact that we didn't check
the number of files of certificates to exemine output for. The workaround for that was implementing file count check from declared input and output for a given test. So besides the file input we configure each test with manually declared number of files from the original source. The improvment in the test case was implemented as a base class for all test cases. All previous test cases were re-configured and re-run. No other errors were found.

  1. List of steps CA is taking to resolve the situation and ensure it will not be repeated.

The issue was immediately fixed by modifying the registration policy. We secured test cases to contain self-check as described above.

Trying to make sure I understand Comment #12 fully:

  • A mistake was made putting the DV policy OID into a profile that was for OV.
  • KIR S.A. runs post-issuance linting in the test environment.
    • While this is stated as "pre-linting", that seems at odds with "We feed our testing tool with a series of certificates and then execute pre-lining"
  • This tool is hosted on a separate machine ("Probably we had some network issue")
  • The failure of this tool was not detected ("We didn't check the number of files of certificates to exemine output for")

The "fix" here is to make sure the number of outputs match the number of inputs. However, there's a risk here that the number of inputs, which have to be manually configured, may not reflect the actual inputs (which could be more or less), so the risk of this issue still exists if someone also makes a typo on configuring the number of inputs (e.g. if, instead of 20 inputs, they entered 18).

However, omitted from this analysis is any sort of discussion about how the original mistake (entering the wrong policy OID) was made. For example, if this was something that was wrong with the documentation that missed review, whether it was human error when entering, and in either event, what steps can or are being taken to systemically reduce those risks (greater review, explicit configurations, etc).

Finally, we still don't have an answer to Comment #8; based on KIR S.A.'s own report (Comment #5), they were aware of this issue before this incident, but failed to file an incident report.

Have I missed anything?

Flags: needinfo?(piotr.grabowski)

(In reply to Ryan Sleevi from comment #13)

Trying to make sure I understand Comment #12 fully:

  • A mistake was made putting the DV policy OID into a profile that was for OV.
  • KIR S.A. runs post-issuance linting in the test environment.
    • While this is stated as "pre-linting", that seems at odds with "We feed our testing tool with a series of certificates and then execute pre-lining"
  • This tool is hosted on a separate machine ("Probably we had some network issue")
  • The failure of this tool was not detected ("We didn't check the number of files of certificates to exemine output for")

The "fix" here is to make sure the number of outputs match the number of inputs. However, there's a risk here that the number of inputs, which have to be manually configured, may not reflect the actual inputs (which could be more or less), so the risk of this issue still exists if someone also makes a typo on configuring the number of inputs (e.g. if, instead of 20 inputs, they entered 18).

The declared input entered in the testcase is double checked by two pair of eyes to avoid the mistake. Besides we compare the files in source and destinations directories (dirdiff) if they are the same.

However, omitted from this analysis is any sort of discussion about how the original mistake (entering the wrong policy OID) was made. For example, if this was something that was wrong with the documentation that missed review, whether it was human error when entering, and in either event, what steps can or are being taken to systemically reduce those risks (greater review, explicit configurations, etc).

It was a human error. For that specific case we improved naming conventions for registration policies replacing part of their names OV , DV with full names DomainValidation, OrganizationValidation.
As a systematic improvment for every new entries/fields/oids in RPs we have also inroduced full team review.

Finally, we still don't have an answer to Comment #8; based on KIR S.A.'s own report (Comment #5), they were aware of this issue before this incident, but failed to file an incident report.

In that specific case we thought it is was not an incident because it was before 2020-09-30 (DV, OV oids went live) . We re-read BR. Now we are fully aware of the definition of the incident. We will file a report when we will fail to comply with ANY requirement of this policy - whether it be a misissuance, a procedural or operational issue, or any other variety of non-compliance.

Have I missed anything?

Flags: needinfo?(piotr.grabowski)

No update here.

Flags: needinfo?(bwilson)

It appears that with the number of bugs filed for KIR issues in a short period of time that there are systemic problems that will need to be addressed. I'm going to leave this bug open for a period of time while we examine KIR's issues.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] → [ca-compliance] Next update 2021-07-01

No update here.

No update here . Are there any additional questions?

No update here . Are there any additional questions?

Ben: I agree with you its concerning re: KIR S.A.'s overall trends and handling, but I'm not sure there's any remaining follow-up tasks on this specifically.

Flags: needinfo?(bwilson)

I am going to close this bug. However, I will consider this bug along with the other ones reported this past April as indicative of a need for KIR to become more detail-oriented in its CA activities and place greater attention on its compliance with requirements. So while I am marking this as "fixed", I do not mean to imply that KIR has fixed its problems - only that it has met its incident-reporting obligation.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2021-07-01 → [ca-compliance]
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [dv-misissuance]
You need to log in before you can comment on or make changes to this bug.