Closed Bug 1705337 Opened 3 years ago Closed 3 years ago

KIR S.A.: Invalid localityName + CRL Revoked but OCSP Unknown

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: michel, Assigned: piotr.grabowski)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

Attachments

(1 file)

Hello,
I noticed this certificate https://crt.sh/?id=2886075910&opt=ocsp that has the value of localityName set to Waraszawa which looks like a typo of Warszawa. This certificate was revoked in CRL, but OCSP returns Cert Status: unknown. That reminds me of https://bugzilla.mozilla.org/show_bug.cgi?id=1525082

Assignee: bwilson → piotr.grabowski
Status: NEW → ASSIGNED
Whiteboard: [ca-compliance]

The OCSP for this crt returns already: revoked.

The technical change OCSP/CRL sync was deployed on the production. Full scan was executed. No certificate with OCSP unknown was found.

What's the ETA for the incident report for this?

Flags: needinfo?(piotr.grabowski)

Invalid localityName + CRL Revoked but OCSP Unknown

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

KIR noticed a typo error in a certificate as soon as a certificate was generated during internal self - audit.
Additionally, KIR was informed by a third-party in a bug https://bugzilla.mozilla.org/show_bug.cgi?id=1705337 (Invalid localityName + CRL Revoked but OCSP Unknown).
The report is related to the issue of the invalid localityName - typo in Locality name: Waraszawa instead of Warszawa. The issue with OCSP Unknown was handled in the bug https://bugzilla.mozilla.org/show_bug.cgi?id=1705657

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2020-06-01 09:43 UTC Certificate was issued.
2020-06-01 09:43 UTC Certificate was revoked.
2021-04-15 09:38 PDT KIR was assigned to this issue.
2021-04-15 13:27 PDT The certificate status in OCSP bacame revoked.

**3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

Yes.
We are in the process of reviewing all currently-active certificates to make sure the Locality/State information is accurate and do not contain any typo. In parallel, we are performing a full review of historically issued certificates for inaccuracies in the Locality/State information.

4. summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

https://crt.sh/?id=2886075910&opt=ocsp

5. The complete certificate data for the problematic certificates.

https://crt.sh/?id=2886075910&opt=ocsp

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The typo was in a certificate request. There are two root sources of the problem. One, just like in the bug https://bugzilla.mozilla.org/show_bug.cgi?id=1705647 the certificate was requested for KIR internal system and in that time the accuracy of the request was verified only by one operator. The typo was overlooked by operator during the request verification, but was noticed just as a certificate was generated. The certificate was revoked at once. Second, we have not yet implemented any automatic validator for available Locality/states so far and that is why we rely on verification made by operator.

7. List of steps CA is taking to resolve the situation and ensure it will not be repeated.

All certificates, without any distinction who ordered it, are now verified by two operators. We will use the blacklist register mentioned in the point 7 of https://bugzilla.mozilla.org/show_bug.cgi?id=1705647#c8 to put there all rejected values. The results of the process of reviewing of active and historically issued certificates will be the input for blacklist register.
On the other hand, we opened project concerning the automation of the certificate generation process. Within this project there will be implemented an automatic mechanism of validation data in the requests. We will integrate already available public registers with states and cities as a correlated values and use this approach as a whitelist for incoming certificate requests validation. We will integrate it in our certificate enrolment flow to ensure that only data with accurate values can be processed.
Additionally, cyclic scans will be executed as a post issuance tests.

Flags: needinfo?(piotr.grabowski)

Update: we have implemented subprocess in request validation flow in which we do automatic check of state and locality fields in TERC/SIMC databases of Statistics Poland https://eteryt.stat.gov.pl/eTeryt/english.aspx

We are preparing businnes requirements and flows for implementing new features described in p.7 https://bugzilla.mozilla.org/show_bug.cgi?id=1705337#c4 in "automation" project. We started research for global state/locality/country reliable database.

Have you examined all of your issued certificates to see if they suffer from similar root causes and/or issues? The answer to Question 7 in Comment #4 seems to only be focused on a go-forward basis, as seemingly confirmed by Comment #5. We've seen a number of CA issues caused by a failure to examine historic data, such as existing certificates or existing, "validated" data that was validated using the problematic procedures.

In particular, it would be quite disconcerting if, after closing this bug, either the issue repeats or more examples of similar situations are found. I'm wanting to make sure that historic information has been compared against using the new procedures, so that KIR S.A. can guarantee that will not be the case.

Flags: needinfo?(piotr.grabowski)

Yes, we have planned the scans for the upcomming week and we should have full results of these scans until 9th of June. I will share the outcome here.

Flags: needinfo?(piotr.grabowski)

We found 15 more cases of similar typos:

ST=Małopolska -> should be Małopolskie
ST=Wielkopolska -> should be Wielkopolskie
ST=slaskie -> should be śląskie
ST=pomoskie -> should be pomorskie
L=Krakow -> should be Kraków
L=Poznan -> should be Poznań

no update here

Piotr: Just to make sure there's no misunderstanding

This incident covered two sets of issues: localityName being incorrect and the OCSP unknown. The OCSP Unknown issue, and its incident report, is largely being tracked in Bug 1705657, despite comments like Comment #1 / Comment #2, so this focuses on the "incorrect locality name" issue.

Comment #4, despite failing to meet the expectations for an Incident Report (by not including concrete timelines), committed to the following actions:

Date Reference Action
2021-04-15 Comment #0 Incident Reported
2021-05-06 Comment #4 All certificates (DV, OV, IV, EV) require two person validation
2021-05-06 Comment #4 Any requests that are rejected during manual review will also be added to a blocklist, for automatic rejection in the future.
2021-05-17 Comment #5 State and locality fields are now validated against TERC/SIMC databases of Statistics Poland
2021-06-09 Comment #9 KIR S.A. completes the scan of its historic issuance (including both active and expired certificates, per Comment #4)

Just making sure I didn't overlook any commitments from Comment #4, and that they are all completed now, as of Comment #10. Is this correct?

If so, I want to draw attention to the expectation to provide a

List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Comment #4 did not provide such a timeline, and that further makes it difficult to piece together KIR S.A.'s remediation steps. In line with Bug 1705187, Comment #30, it's important to make sure KIR S.A. overhauls its incident reporting process to better align with community expectations, particularly regarding timeliness of updates and commitments.

This is particularly important given that KIR S.A. appears to have overlooked recent discussion about the expectations. Mozilla Root Store Policy v2.7.1 requires that:

CAs MUST follow and be aware of discussions in Mozilla's dev-security-policy forum, where Mozilla's root program is coordinated.

And it does not appear to be the case here.

Flags: needinfo?(piotr.grabowski)

(In reply to Ryan Sleevi from comment #12)

Piotr: Just to make sure there's no misunderstanding

This incident covered two sets of issues: localityName being incorrect and the OCSP unknown. The OCSP Unknown issue, and its incident report, is largely being tracked in Bug 1705657, despite comments like Comment #1 / Comment #2, so this focuses on the "incorrect locality name" issue.

Agree

Comment #4, despite failing to meet the expectations for an Incident Report (by not including concrete timelines), committed to the following actions:

Date Reference Action
2021-04-15 Comment #0 Incident Reported
2021-05-06 Comment #4 All certificates (DV, OV, IV, EV) require two person validation
2021-05-06 Comment #4 Any requests that are rejected during manual review will also be added to a blocklist, for automatic rejection in the future.
2021-05-17 Comment #5 State and locality fields are now validated against TERC/SIMC databases of Statistics Poland
2021-06-09 Comment #9 KIR S.A. completes the scan of its historic issuance (including both active and expired certificates, per Comment #4)

We have reviewed our previous incident reports and indeed we see a space for improvement to better share lessons learned that could be helpful to all CAs to build better systems.
It also concerns timelines where sometimes we don't include concrete events that might be helpful for tracking and reviewing given incident and making easier to piece together remediation steps as you noticed further.

Just making sure I didn't overlook any commitments from Comment #4, and that they are all completed now, as of Comment #10. Is this correct?

Yes, they are all completed now.

If so, I want to draw attention to the expectation to provide a

List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

Comment #4 did not provide such a timeline, and that further makes it difficult to piece together KIR S.A.'s remediation steps. In line with Bug 1705187, Comment #30, it's important to make sure KIR S.A. overhauls its incident reporting process to better align with community expectations, particularly regarding timeliness of updates and commitments.

We will certainly do our best to make future reports (if any) as accurate and precise as possible. For this particular case we filled the report on 2021-05-06 so we simply could not add 2 last actions.

This is particularly important given that KIR S.A. appears to have overlooked recent discussion about the expectations. Mozilla Root Store Policy v2.7.1 requires that:

CAs MUST follow and be aware of discussions in Mozilla's dev-security-policy forum, where Mozilla's root program is coordinated.

We have read the discussion carefully and with understanding and we are going to follow the recommendations and conclusions contained therein.

And it does not appear to be the case here.

Flags: needinfo?(piotr.grabowski)

No update here . Are there any additional questions?

Setting N-I for Ben here to consider closing out.

I'm hoping KIR S.A. recognizes that while they're taking steps for this particular issue (Covered in Comment #12), that the design philosophy of "technical controls, strong automation" is something that makes sure it's examined throughout the CA operations. Other CAs' bugs, such as Bug 1712188 (e.g. Bug 1712188, Comment #19), highlight the importance of ensuring that it's not just a "per-feature" consideration, but one that permeates all of the CAs' decision making.

Flags: needinfo?(bwilson)

I think this can now be closed and will schedule myself to do so on or about this Friday, 16-July-2021.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ocsp-failure]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: