Closed Bug 1709392 Opened 3 years ago Closed 3 years ago

Asseco DS / Certum: Invalid stateOrProvinceName field (recurrent incident)

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fozzie, Assigned: aleksandra.kurosz)

References

Details

(Whiteboard: [ca-compliance] [ov-misissuance])

Certum has issued the following certificates with invalid stateOrProvinceNames:

This seems like a recurrant issue of bug 1667986.

Two of these certificates were issued before that bug was filed, showing that a retrospective scan of Certum certificates did not identify these certificates. One certificate was issued after that bug was filed, showing that Certum doesn't have adequate control methods preventing further misissuance.

Assignee: bwilson → aleksandra.kurosz
Status: NEW → ASSIGNED
Whiteboard: [ca-compliance]
See Also: → 1667986

Thank you for reporting that, we check our previous analyze and will back with the incident report.

  1. How your CA first became aware of the problem.

On Tuesday, May 4th, 2021, Certum received the report from a third party about incorrect issues of certificates. It has been indicated that were issued certificates with incorrect value of the stateOrProvinceName field, which should be covered by the bug 1667986 and one new.

  1. A timeline of the actions your CA took in response.

Time in UTC.

2021-02-03 Bug 1667986 was closed.

2021-02-15 Analysis of possible stateOrProvinceName checking with additional tools or external APIs as operator support started.

2021-03-08 Additional stateOrProvinceName checking was introduced as a part of the address validation. We decided to add this as a manual step at this point and add automated validation to future development roadmap.

2021-03-29 Analysis of the possibility of creating the role of an operator with lower powers to issue certificates started. Operator with lower powers would require his decisions to be confirmed by a second person with regular powers.

2021-04-01 Role of an operator with lower powers was accepted and added to current development roadmap for Q2 2021.

2021-05-04 17:50 Notification is received via the email address revoke@certum.pl.

2021-05-04 18:00 This bug was created.

2021-05-04 18:00 Preliminary analysis started.

2021-05-05 08:00 The second stage of analysis started. We once again started analyzing the previous report with stateOrProvinceName from bug 1667986. We confirmed that 2 of 3 reported certificates should be revoked in the previous bug.

2021-05-05 13:00 We carried out a new analysis of stateOrProvinceName for the period between 2021-01-01 and 2021-05-05, we found only one certificate with the country in this field (Moldova) and one additional certificate for this client with stateOrProvinceName filled as MD.

2021-05-07 13:00 As a result of re-analysis from the previous bug, we found 5 certificates with a country in the stateOrProvinceName field (Malta). We scheduled revocation within 5 days.

2021-05-09 17:50 4 affected certificates have been revoked.

  1. Confirmation that your CA has stopped issuing TLS/SSL certificates with the problem.

The additional stateOrProvinceName field checking we have introduced show effectiveness in the form of a reduced number of errors compared to the previous bug. We constantly take measures to reduce the number of errors that may result from the manual processing of applications for certificates.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

First certificates were described in bug 1667986, last certificate was issue 2021-03-02.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

https://crt.sh/?serial=7d935a69eab7f34740bc54fdece99921

https://crt.sh/?serial=5697a4d99cdee6664d708d725a083eb0

https://crt.sh/?serial=62f18512bd8d28ae2e00479b3d17407a

https://crt.sh/?serial=a1f81367ca44052eb367a6f4d9b8f5f

https://crt.sh/?serial=10dd2e9674ac781d0903a50d32cbbcdc

https://crt.sh/?serial=70be53f954d89b5f47ff2f45bc56c8c3

https://crt.sh/?serial=3b852f0f73eb2500cfe8b51596f4a446

https://crt.sh/?serial=1c8b25cbb9dddb9d83d98b0ffcf1c340

https://crt.sh/?serial=52393a5d366a02286c4660e14bdb419d

6.Explanation about how and why the mistakes were made, or bugs introduced, and how they avoided detection until now.

For certificate https://crt.sh/?serial=62f18512bd8d28ae2e00479b3d17407a a new person in vetting team based on information from previous not revoked certificates https://crt.sh/?serial=7d935a69eab7f34740bc54fdece99921 and https://crt.sh/?serial=5697a4d99cdee6664d708d725a083eb0.

For certificate https://crt.sh/?serial=a1f81367ca44052eb367a6f4d9b8f5f a person from vetting teams appears to have accidentally copied the value from countryName.

For certificates https://crt.sh/?serial=10dd2e9674ac781d0903a50d32cbbcdc, https://crt.sh/?serial=70be53f954d89b5f47ff2f45bc56c8c3, https://crt.sh/?serial=3b852f0f73eb2500cfe8b51596f4a446, https://crt.sh/?serial=1c8b25cbb9dddb9d83d98b0ffcf1c340 and https://crt.sh/?serial=52393a5d366a02286c4660e14bdb419d a person from vetting teams appears to have accidentally copied the full name of a country from the source verification document. As all these certificates were issued in a short period of time the first error was copied to all certificates.

In the previous bug, the stateOrProvinceName verification was done manually. Couple of certificates were marked as valid - including the one with Moldova. This verification error caused, that when we were looking for certificates to revoke, some were not on the list.

All misissuances and not finding errors when manual verification was conducted, are both results of some human error which we determined as the main issue for all stateOrProvinceName errors from this and the previous bug. Therefore, our main goal is to support manual process with tools and reduce human error as much as possible. Taken steps we describe in point 7.

  1. List of steps your CA is taking to resolve the situation.

The bug 1667986 showed us that human error is a serious problem we need to eliminate during verifying addresses process, in particular for stateOrProvinceName. Therefore, even after closing bug 1667986, we took appropriate steps to reduce human error in this process:

  • We decided to double-check for OV certificates as well, as most of the affected certificates are OV. In addition, we created the role of an operator with lower powers to issue certificates. Operator with lower powers would require his decisions to be confirmed by a second person with regular powers. We started implementing this solution after closing the previous bug, it is now in the testing phase, and we expect to bring it to the production system in June.

  • We are going to introduce automated address verification supporting operators to minimize the possibility of human error. This change will apply not only to stateOrProvinceName field but also other address fields, e.g. locality name. We are done with the initial research and now we are in process of checking the effectiveness of available solutions. We plan to deploy it into production by the end of this year.

  • We are even considering removing stateOrProviveName from our SSL profiles as it is an optional field. The final decision will be made based mainly on the results of the effectiveness of automated address verification (the initial results show that for stateOrProviveName available tools do not give expected results).

  • In addition to the self-audit specified in the CABF BR, we decided to introduce a monthly checking of issued certificates for stateOrProvinceName until the automated address verification will be in place. It should help us to detect misissuances in a timely manner.

  • We are conducting analysis of all active certificates for any type of errors for stateOrProvinceName. We will inform about the results of our scan in this bug.

Aleksandra: Thanks for the report here. I think it's encouraging to see Asseco DS / Certum is improving in the quality and thoroughness of its reports, comparing Bug 1667986 to this bug.

That said, it seems the proposed mitigations, which Asseco DS/Certum is reportering were already under consideration, are, effectively:

  • Monthly self-audits to detect quality control issues post-issuance
  • Two-party verification for a subset of certificates (those issued by lower power operators)
  • Automated address verification
  • Revisions to certificate profiles

Many of these mitigations have been proposed by other CAs, and each have their own sets of failure modes. For example, monthly self-audits help detect, but don't really help prevent or mitigate, and we've seen CAs have issues with staffing being over-committed to be able to do these effectively.

Two-party verification systems can be good, but this plan seems to allow still for single-party verification (higher power operators), and still runs the risk of both operators making mistakes in judgement (e.g. one operator being persuasive as to why to allow a particular value).

Automated address verification is good, and several CAs have explored mechanisms for this, but as you note, it can tend to be challenging depending on what the CA's policies are for the values they allow (e.g. whether or not to allow localized names). While stateOrProvinceName may lend itself to ISO 3166-2 subdivisions, other fields, such as streetAddress, do not.

So while these are all positive improvements, they are not without challenges. However, as part of thinking deeper about root causes and systemic issues, I think it's also useful for Asseco DS / Certum to examine whether the same concerns here, with stateOrProvinceName, apply to other fields and practices at the CA. That's where the real value of these incident reports come in.

For example, serialNumber in EV certificates is equally prone to these sorts of issues, both in data entry and in data validation. I think, for example, incidents like Bug 1576013 show a lot of lessons and challenges here with validation that may make sense to make sure the issues here are being thought of generally, to all fields the CA is validating, and not just strictly stateOrProvinceName.

This also seems to hint at the potential for another issue: How well Asseco DS / Certum is able to keep up with other CAs incidents, recognize both patterns and potential risks, and apply them to proactive preventative measures. For example, two-party validation has been discussed for years, so its quite unfortunate that this is only just now being implemented. Similarly, automated validation has been discussed for years, so its unfortunate that this doesn't look like it'll be in place before end of year, even on an optimistic timeframe.

So I think another area to examine as part of this is "How can we learn from other CA incidents and incorporate those lessons more quickly"? Monitoring Bugzilla is a great way, but it needs to both be reviewed by multiple parties (to avoid a single engineering thinking something is not applicable, which has happened for several other CAs), and it needs to be actionable (e.g. that items like "change certificate profiles" are seriously considered, and not simply dismissed).

Could you share more about what your current practices are for staying aware of industry trends and incident discussion, and whether this incident has changed that? Have you, for example, examined other CAs' incidents regarding stateOrProvinceName to see what you can learn? And if so, can you share incidents you considered, even if you ultimately determined them not applicable?

Flags: needinfo?(aleksandra.kurosz)
Summary: Asseco DS / Certum: Invalid stateOrProvinceName field (recurrant incident) → Asseco DS / Certum: Invalid stateOrProvinceName field (recurrent incident)

I also found https://crt.sh/?id=2208084200 that has stateOrProvinceName: Malopolskia. That's a typo.

(In reply to Michel Le Bihan from comment #4)

I also found https://crt.sh/?id=2208084200 that has stateOrProvinceName: Malopolskia. That's a typo.

This certificate has not been issued by Certum.

Thanks for the questions, Ryan, we'll be responding to them shortly.

As a result of the analysis from point 7, we found 23 more certificates with incorrect entries in the stateOrProvinceName field other than country names (typos, incorrectly verified provinces). The cause of the errors is the same as described in point 6 of the report.

List of certificates:

https://crt.sh/?serial=619c87af342dd2c663d3c282e58f138d
https://crt.sh/?serial=5cd5905935463b053f065ff3fdc02622
https://crt.sh/?serial=7a9b29a7bdcee8396efeb59f0711778f
https://crt.sh/?serial=4eae00fa3824200d94f79a33e696fa40
https://crt.sh/?serial=7cb66914d30a1e6712728a18acf48bbe
https://crt.sh/?serial=7b90a193c43596812421d820f33e6e43
https://crt.sh/?serial=728cc631d04ca4ddd13f08c7cabc32c
https://crt.sh/?serial=7f065104a0b5bb0fc8ee83d018fcaae4
https://crt.sh/?serial=18ff7a68622faf5052408d3e656573f9
https://crt.sh/?serial=26a25abaf8658deb7f223fa8afa84ffb
https://crt.sh/?serial=6219acbdee593d4b72797a049c7db2d7
https://crt.sh/?serial=52e25e923a8ddcb89eb2c324d7416a3f
https://crt.sh/?serial=7e8a6cf374d578afc750d201638d733f
https://crt.sh/?serial=7b14005729b5a3c60cba392f676802da
https://crt.sh/?serial=a78ec14be17fe4f168ab8ba01807dae
https://crt.sh/?serial=2ff9c0a8b4eb23949a0a03b97ff8db2b
https://crt.sh/?serial=38f94ca370b9e0d30fa28ed722a224ec
https://crt.sh/?serial=70568853efc50fbbe44992c8f351a043
https://crt.sh/?serial=5348c18c9510b8c1c66e25ba80733dbd
https://crt.sh/?serial=6dbe7d8c5d37480967992b16f80c85db
https://crt.sh/?serial=332e957b18a0bbc6ac9fcaf00c1f6e7e
https://crt.sh/?serial=30f8ec5e674eb0a3ee0d02d3220b3954
https://crt.sh/?serial=2e9089798a670fd18caa9602199f5213

Certificates will be revoked by 15/05/2021 19:00

5 certificates with "Malta" was revoked, the last one 2021-05-12 12:05

(In reply to Ryan Sleevi from comment #3)

Ryan, referring to your comment:

That said, it seems the proposed mitigations, which Asseco DS/Certum is reportering were already under consideration, are, effectively:

  • Monthly self-audits to detect quality control issues post-issuance

We are aware that self-audits won’t help prevent misissuance. We consider it rather additional check, to detect errors as soon as possible, not a basic prevention method. We plan to make this verification both manually and with prototype of automated address verification and compare and analyze the results.

  • Two-party verification for a subset of certificates (those issued by lower power operators)

There are two changes in this one – both made in our system, not only procedural.

The first change is in extending 2-steps verification on all OV certificates. Currently system enforces 2-steps verification for EV and CodeSigning certificates. After June certificates 2-steps verification will be enforced on system level for all OV and EV.

The other change is in creating two roles for operators:

Operator with lower powers will be only able to do the first step verification, which for OV and EV certificates never results in issuance

Operator with higher powers will be able to do first step or the second step – never both

We assessed that the mistakes are made by operators with less experience, so if system enforces that operator with lower powers will always be supervised by operator with higher powers it should reduce amount of errors. Only higher powers operator will be able to actually issue the certificate.

We are aware, this is not eliminating human error, but should reduce the chance of error, and could be added to our system in short period of time, while adding automated address verification requires more complex changes.

  • Automated address verification

Automated address verification is, as you mentioned, very challenging. We did research on L, ST, C data set with GeoNames, OpenStreetMap, UN/LOCODE and ISO 3166-2 and encountered issues mostly with diacritic marks and local names vs English names. After analyzing results, we learned that after removing stateOrProvinceName from data set, we have significant increase in effectiveness.

This is very promising but looks like it will still require human support for all addresses not recognized by automated verification. We need to consider different scenarios and decide whether invalid value should be corrected, removed or should we ask customers to issue new request. Also, we need to decide which method is the best or should we use more than one.

  • Revisions to certificate profiles

That leads us to last point that is removing stateOrProvinceName from our SSL profiles, which will eliminate stateOrProvinceName problem, but not solve address validation problem – L and C for OV and additional street and postalCode in EV. We could remove street and postalCode from EV certificates, as we noticed some CAs did, and we have this option under consideration too.

Many of these mitigations have been proposed by other CAs, and each have their own sets of failure modes. For example, monthly self-audits help detect, but don't really help prevent or mitigate, and we've seen CAs have issues with staffing being over-committed to be able to do these effectively.

Two-party verification systems can be good, but this plan seems to allow still for single-party verification (higher power operators), and still runs the risk of both operators making mistakes in judgement (e.g. one operator being persuasive as to why to allow a particular value).

Automated address verification is good, and several CAs have explored mechanisms for this, but as you note, it can tend to be challenging depending on what the CA's policies are for the values they allow (e.g. whether or not to allow localized names). While stateOrProvinceName may lend itself to ISO 3166-2 subdivisions, other fields, such as streetAddress, do not.

So while these are all positive improvements, they are not without challenges. However, as part of thinking deeper about root causes and systemic issues, I think it's also useful for Asseco DS / Certum to examine whether the same concerns here, with stateOrProvinceName, apply to other fields and practices at the CA. That's where the real value of these incident reports come in.

For example, serialNumber in EV certificates is equally prone to these sorts of issues, both in data entry and in data validation. I think, for example, incidents like Bug 1576013 show a lot of lessons and challenges here with validation that may make sense to make sure the issues here are being thought of generally, to all fields the CA is validating, and not just strictly stateOrProvinceName.

This also seems to hint at the potential for another issue: How well Asseco DS / Certum is able to keep up with other CAs incidents, recognize both patterns and potential risks, and apply them to proactive preventative measures. For example, two-party validation has been discussed for years, so its quite unfortunate that this is only just now being implemented. Similarly, automated validation has been discussed for years, so its unfortunate that this doesn't look like it'll be in place before end of year, even on an optimistic timeframe.

So I think another area to examine as part of this is "How can we learn from other CA incidents and incorporate those lessons more quickly"? Monitoring Bugzilla is a great way, but it needs to both be reviewed by multiple parties (to avoid a single engineering thinking something is not applicable, which has happened for several other CAs), and it needs to be actionable (e.g. that items like "change certificate profiles" are seriously considered, and not simply dismissed).

Could you share more about what your current practices are for staying aware of industry trends and incident discussion, and whether this incident has changed that? Have you, for example, examined other CAs' incidents regarding stateOrProvinceName to see what you can learn? And if so, can you share incidents you considered, even if you ultimately determined them not applicable?

For almost three years there has been a dedicated team in Certum that deals with tracking industry trends and incident discussion. The team includes employees who hold various positions in the CA, they are people dealing with issuing certificates, quality specialists, administrators, developers and business analysts. The team meets periodically and summarizes its activities and current issues.

When it comes to bug tracking the team's members are watching the Bugzilla CA Certificate Compliance component with all necessary notifications enabled so all information about newly created or updated bugs goes to our inboxes. If we determine that a given bug may be applicable to us we start an analysis. The first step is usually to verify whether there were any misissuance. Even if we do not detect any wrongly issued certificates we continue our efforts to mitigate the risk: sometimes it's a change in procedure, sometimes it's a configuration change, sometimes it's a new version of the system.

We looked for examples to show how bug tracking looks in practice:

https://bugzilla.mozilla.org/show_bug.cgi?id=1632632 - in response to this bug, we have verified that we also allow empty BC when customer is issuing request, and that we fill that field later during organization verification. We decided to change the system to always require BC field from customer and validate it on each stage: issuing request, verifying organization data, issuing certificate so there is no possibility to issue certificate with incorrect or empty BC. We also checked whether we had issued certificates with value of BC not matching allowed values, and we found none.

https://bugzilla.mozilla.org/show_bug.cgi?id=1620772 – In response to this bug, we reexamined our Debian Weaks Keys lists in order to ensure that we properly mitigate that risk and that we have no gap here. We performed sets of tests to be sure that system works as expected.

https://bugzilla.mozilla.org/show_bug.cgi?id=1624527 - In response to this bug, we changed our system in a way that there is no possibility of issue a certificate with a key marked as compromised. We also considered integrating our system with https://pwnedkeys.com/ API as a part of analysis potential supported solutions.

Regarding stateOrProvinceName incidents. Before https://bugzilla.mozilla.org/show_bug.cgi?id=1667986 we have not analyzed them in as much detail as perhaps we should. It seems to me the main reason for that may be a fact that just a few years ago, with a significantly smaller number of issued certificates, we were able to carry out most of the process manually. Currently, with the increase in the number of issued certificates, we see that adding an automation in order to minimize human errors is essential.

We are aware that we should not focus here just on stateOrProvinceName issue but also on all other error-prone fields and we are making efforts to address this problem holistically.

Flags: needinfo?(aleksandra.kurosz)

All certificate mentioned in comment 6 are revoked, the last one: 2021-05-15 17:26

There are not update in this bug. Are there any additional questions?

There are not update in this bug. We are in the process of additional analyzes.

I'm setting N-I for Ben, as I think this answers all the questions. I appreciate the detailed reply in Comment #8 that speaks to the systemic issues.

Flags: needinfo?(bwilson)

All certificate mentioned in comment 13 are revoked, the last one: 2021-06-16 11:28

There are not update in this bug. Are there any additional questions?

No updates. If there are no additional questions, can this bug be closed?

Ben?

I'll schedule to close this on or about Wed. 2021-07-07.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ov-misissuance]
You need to log in before you can comment on or make changes to this bug.