Open Bug 1567456 Opened 5 months ago Updated 9 days ago

T-Systems: "Some-State" comparable issues

Categories

(NSS :: CA Certificate Compliance, task)

task
Not set

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: Arnold.Essing, Assigned: Arnold.Essing)

Details

(Whiteboard: [ca-compliance] Next Update - 30-November 2019)

Attachments

(1 file)

3.21 KB, application/vnd.ms-excel
Details
Attached file 20190715-T-Systems.csv

Description:
In the course of the "Some-State" topic, we have started an internal investigation. This was carried out by a separate department within our company. It should be analyzed whether there are other "Some-State" comparable errors in the active database. The analysis showed that erroneous certificates were issued. In addition to typing and spelling errors, the fields L and ST were frequently swapped, as well as ST being reentered again as the nation.

  1. How your CA first became aware of the problem, and the time and date.
    2019-07-15 06:00 UTC: Received the internal report with the list of erroneous certificates.

  2. A timeline of the actions your CA took in response.
    2019-07-15 06:00 UTC: Received the internal report with the list of erroneous certificates.
    2019-07-15 08:30 UTC: Conference call to discuss further actions.
    2019-07-15 10:18 UTC: The affected customers were informed to replace the erroneous certificates immediately.
    2019-07-16 13:31 UTC: Our auditor was informed.
    2019-07-16 14:30 UTC: Initiated discussions with the software developer.
    2019-07-19 12:17 UTC: The last affected certificate was revoked.

  3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
    The last affected certificate was issued on June 18, 2019.

  4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
    137 valid certificates were affected. The first certificate was issued on July 21, 2016 and the last on June 18, 2019.

  5. The complete certificate data for the problematic certificates.
    See attachment 20190715-T-Systems.csv

  6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
    In individual instances these locality checks were not assigned the necessary importance. Errors arose as a result of the requirements and instructions being interpreted in a customer-friendly way by the RA personnel. This was mainly due to the fact that it is not possible to overwrite the CSR in one of our solutions. A new commissioning of the customer’s csr would have been necessary. This meant that apparent non-critical incorrect entries were partially accepted. Typing and spelling mistakes of the customer, swapping L and ST and e.g. a state as ST was sometimes not perceived as a clear mistake and not treated as such.
    Furthermore, the GUI of the review process was confusing and insufficient for the validation process. This circumstance was not taken into account with the necessary consequence and system engineering measures were not sufficiently ensured.

  7. List of steps CA is taking to resolve the situation and ensure it will not be repeated.
    As described in bug 1551371 "T-Systems:" Some-State "in stateOrProvinceName", we have detailed our specification and increased the entire RA team's awareness. The RA team was provided with lookup tables and technical checks using blacklists were successfully integrated. With our new release on 2019-07-03 the 4-eyes-principle is now also technically enforced. The clarity of the GUI has also been significantly improved.
    In addition, we plan to implement additional technical aids. The goal is to reduce the possibility of errors to a minimum. This includes the optimization of CSR input in the order. In the future, it should be possible for the customer to choose base values from a database when ordering, so that logical, spelling and syntax errors can no longer occur. We are currently in discussions with our software developer and expect a proposal for the technical implementation by the 8th of August.

Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Assignee: wthayer → Arnold.Essing
Whiteboard: [ca-compliance]

Arnold,

Thanks for opening this. I appreciate that:

  • On the basis of the issues, you opened an investigation to look into things more comprehensively
  • Once issues were discovered, you promptly (within the BR timeframe) took steps to revoke
  • That this issue was similarly disclosed in a prompt time, regardless of the revocation status.

In terms of root cause <-> mitigations relationship, I'm always concerned when solutions require doubling down on human factors, especially when automated solutions exist. For example, I'm appreciative of the blocklist approach being automated, but it sounds like you also have an allowlist that is not automated, and relies on human factors. Have you considered automating the allowlist as well?

Similarly, the description of problems notes that the system saw the CSR as immutable, but I'm not sure how that relates to "optimization of CSR input", and what that technically means.

For understanding the GUI change, understanding visually what the before and after looked like are perhaps adequate and necessary descriptions, in order to understand the original issue and how the issue has been corrected. Can you attach screenshots?

From a process management side, have you examined why your self-audits didn't catch this issue? Similarly, have you examined for any other 'customer friendly' behaviours your RA staff may have adopted? I can appreciate that you trained and re-educated staff regarding this particular matter, but I'm worried there may be other historic patterns, and would love to understand how that's being examined and possibly addressed.

I'm tentatively setting the next update to 8 August to provide a new timeline. However, please see about answering these questions before then, and within the next week. Thanks!

Flags: needinfo?(Arnold.Essing)
Whiteboard: [ca-compliance] → [ca-compliance] Next Update - 08-August 2019

Hello Ryan,

Yes, we are considering a more extensive automation approach and have already involved the software experts from the developer.
Revision of the online order form is currently in discussions with the software developer, particular the CSR handling. At the present, the customer must always submit the country, location, and stateOrProvinceName data as part of the CSR. In the case of erroneous data, the customer must consequently create a new CSR. We plan to change this way so that in addition, the above mentioned data can be transmitted to the CSR and be edited by the customer in the event of an error.

The former GUI version which the RA staff was using was so implemented that the subject-DN details of the customer were not displayed clearly enough when next to each other in a row. Now, all subject-DN fields are very clearly arranged under each other and grouped.

We have agreed with our internal auditor to extend the internal audits to a larger number of certificates to be audited in order to increase the likelihood of detecting errors. In addition, our internal auditor has been using a location database of our target regions since July 11, 2019 to support an optimized review process.

We have been intensively analyzing the results of the evaluation of our certificates over the past few days with our RA staff to understand this "customer-friendly" behavior. The previous examination of the database which lead to the current bug report already considered various "historic patterns", such as correct combinations of C and ST, C and L or L and ST or the spelling of fields C, L and ST.

Thanks James. I appreciate the added details, but I don't think those really go to the substance of what I was asking.

For example, your answer that you have extended the internal audits doesn't really answer my question about the examination about what failed and why.

I'm hoping you can review my Comment #1 and try to make sure you've got a thorough explanation about the factors involves and how the previous processes failed, in a way that can be clear and understandable, and not just a description about the changes you're making. Either, without the other, isn't near as valuable as both, in terms of building both confidence in the CAs and the strength in the ecosystem.

Hello Ryan,

You are correct, the analysis results of the audit process were missing in our response.
While analyzing the processes we investigated among other things, the question, why the faulty certificates were not revealed during the internal audits.

To find the reason here, we also reviewed the test methodology of the internal audits. It provides various elements to verify the correctness of the overall process:

The internal audits are divided into the key areas described below. These essentially refer to process conformity, documentation, retention and validity periods against its own as well as the requirements of the CA Browser Forum.

On-site inspection of external and internal order documents:

(1) Presence of customer documents
(2) Illustration of the completed order in the system
(3) Presence of an organizational confirmation e.g. Commercial register excerpt and its accuracy
(4) Correctness and readability of the order
(5) Match the order data in the system with the order data in the order.
(6) Compliance with the specified validity periods
(7) Basic logical processing errors

Furthermore, a part of this sample quantity is additionally subjected to a technical detailed examination.

Examination of the certificate content:
• Is the decoding successful (basic encoding).
• Examination of existing values
• Are the relevant certificate components present and free of errors?

The detailed inspections are carried out via a visual and manual review by the internal auditor.
It appears that the additional technical detailed tests were not carried out in a sufficient enough volume, since none of the errors described were detected during an internal audit.

One reason for this was that the goal of internal audits had so far primarily been to qualitatively maintain the validation process. The internal audit, which focuses much more on the validation process was thus prone to overlook the listed error possibilities in the certificate.

Therefore, as a measure, we decided to increase the audit scope in relation to the quantity of allowances to be tested in detail, improving the test process technically with additional comparison data, as well as to sensitize the internal auditor in regards to errors, simple typos (e.g. "t" instead of "tt" or "ttt" instead of "tt") but also content errors (e.g. "location" is not geographically in the "stateOrProvinceName") and sources of error.

We did not identify any other deficiencies in the audit process which could lead to the errors being overlooked.

As planned, we have worked out an implementation proposal with our software developers.
In addition to the existing reviews of the CSR fields (e.g. C against 3166-1) additional automated DB based reviews with the aid of Look-Up lists for C, L and ST subject fields are being implemented.
Discrepancies in the CSR will be immediately displayed in color to the customer on the frontend. The customer then has the opportunity to make the appropriate adjustment directly from the frontend. The customer is therefore no longer obliged to create a new request in case of discrepancy. The customer can only place requests without apparent errors. This measure minimizes the sending of erroneous orders in the subject fields. Planned launch date is November 29, 2019.

Flags: needinfo?(Arnold.Essing)

I have changed the next update to 30-November. If no other questions are asked, please update this bug when the new implementation is launched or if the launch is delayed for any reason.

Whiteboard: [ca-compliance] Next Update - 08-August 2019 → [ca-compliance] Next Update - 30-November 2019

(In reply to Arnold Essing from comment #0)

  1. The complete certificate data for the problematic certificates.
    See attachment 20190715-T-Systems.csv

Again, as was the case in https://bugzilla.mozilla.org/show_bug.cgi?id=1498463#c15 , a lot of these certificates are not logged to CT and, thus, not disclosed. This is particularly troublesome as it implies that you do not try to improve your processes and learn from past incidents, a vital part of any quality management.

When ordering, the customer can disable the option to have his organizationally validated certificates recorded to public databases (ct-log, crt.sh, ....).

The new software release including additional automated DB based reviews with the aid of Look-Up lists for C, L and ST subject fields was implemented in the production environment on November 25, 2019.

You need to log in before you can comment on or make changes to this bug.