Open Bug 1888371 Opened 3 months ago Updated 8 days ago

e-commerce monitoring GmbH: CRLs with mismatched issuer

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: agwa-bugs, Assigned: ca, NeedInfo)

Details

(Whiteboard: [ca-compliance] [crl-failure] [external])

Attachments

(6 files)

e-commerce monitoring has issued the following CRLs (also attached to this bug):

This is a violation of the Baseline Requirements CRL Profile, which states that the CRL Issuer "MUST be byte-for-byte identical to the subject field of the Issuing CA."

Assignee: nobody → ca
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [crl-failure]
Whiteboard: [ca-compliance] [crl-failure] → [ca-compliance] [crl-failure] [external]

Thank you Andrew for bringing this to our attention. This is just to confirm that we have received your report. We are in the process of gathering information and will get back to you in due time.

Quick status info before we are able to provide an incident report: We have already found a workaround to avoid a reoccurrence. For a sustainable solution that adresses the root cause, we've added more DevOps resources. We are making this a high priority and will keep you updated.

Dear all,

as this week is e-commerce monitoring gmbh's audit week, our compliance resources which include incident reporting, were mainly focused on that. Our DevOps have been working s with high priority on various improvements that address current Mozilla bugs.

For upcoming week, we have scheduled a test session and remain confident to rollout into the production environment soon. We will provide an update before Friday 26th EOB.

Best regards,
Daniel

Incident Report

Summary

On 25th March 2024, e-commerce monitoring gmbh has issued two CRLs for SSL Certificates in violation of 7.2 CA/Browser Forum Baseline Requirements in their then-current Version 2.0.2.

Impact

Both CRLs were issued for SSL Certificates

  • one OV policy and in scope of all Major Root Store Policies (CRL I)
  • one EV and QWAC policy and in scope of the Microsoft Root Store Policy (CRL II)
    e-commerce monitoring gmbh has not stopped further issuance of certificates (N/A) and obviously not the capabality to revoke certificates via CRL

Timeline

All times are UTC.

2024-03-20

  • 19:21 Update of database structure

2024-03-25:

  • Minutes before 09:21: correct CRLs for other services issued
  • 09:21 Issuance of CRL I
  • 09:21-09:34 correct CRLs for other services issued
  • 09:34 Issuance of CRL II
  • Minutes after 09:34 correct CRLs were issued for other services

2024-03-28:

  • 10:58 this bug was filed

2024-03-29:

  • 10:04 Receipt of bug, internal discussions
  • 12:51 First response to bug

2024-04-01

  • Testing of behaviour via other CRL issuance processes in order to find workaround

2024-04-02

  • Deactivate flawed process, confirm workaround, update documentation, inform involved personnel

Root Cause Analysis

We use a fully automated process for CRL issuance. The tool retrieves the necessary data including the issuer from a database in which certain values are entered directly, certain values are based on templates/configuration files and other values are retrieved from other databases.
This logic is the same regardless of whether it is a routine CRL renewal due to expiry or an immediate issue due to revocation. The CRL Issuer field where the error occurred is defined in a configuration file.

Since our system maps various services that also use CRLs, to which the automated procedure described above does not (fully) apply, there are also several places where such configuration files or other data sources may be present.

A recent update to the database structure resulted in an incorrect configuration file being used to create the CRL. As a result, the issuer line according to the CRL was set in a different order than the subject of the issuing certificate. The database update was accompanied by our usual development processes with great care and comprehensive tests, but the test cases were incomplete and did not cover the byte-for-byte-identical comparison.

From this point of view, we consider the insufficient representation of possible test scenarios to be the root cause of this incident.

Lessons Learned

What went well

  • Rapid actions taken: Quick finding of a workaround by highly skilled staff, quick definition of further Action Plan.

What didn't go well

  • Detection by an external. We are in the process of addressing this through the last two Action Items.
  • Unfavourable overlap of negative factors. We have been working to reduce the dependency on human input and manual processes by technical measures and automated processes. However, this is a longer process, which in turn depends on human input.
  • Delay in incident reporting. No corresponding Action Item here, since it is to be discussed https://bugzilla.mozilla.org/show_bug.cgi?id=1893546

Where we got lucky

  • Early detection (Before next CRL issuance.). We are in the process of addressing this through the last two Action Items.
  • Small number of CRLs affected. N/A no corresponding Action Item.

Action Items

Action Item Kind Due Date
Workaround Prevent Done
----------- ---- --------
Monitoring workaround Prevent no due date / ongoing
----------- ---- --------
Code analysis Prevent, Improve 2024-05-10
----------- ---- --------
Bug Fix Resolve, Prevent 2024-05-31
----------- ---- --------
4-eyes Review by Security Staff Prevent 2024-06-04
----------- ---- --------
Internal Test System Prevent 2024-06-10
----------- ---- --------
4-eyes Approval by Compliance Staff Prevent 2024-06-10
----------- ---- --------
Deployment Productive Systems DevOps Prevent 2024-06-17
----------- ---- --------
Improve CRL pre issuance linting Prevent, Detect, Improve 2024-06-21
----------- ---- --------
Improve Monitoring & Notification Prevent, Detect, Improve 2024-06-21
----------- ---- --------

Appendix

Details of affected CRLs

Your timeline states:

2024-04-02

Deactivate flawed process, confirm workaround, update documentation, inform involved personnel

However, the CRLs currently being served at http://service.globaltrust.eu/static/globaltrust-2020-server-ov-1.crl and http://service.globaltrust.eu/static/globaltrust-2015-server-qualified-ev-2.crl continue to have a mismatched Issuer, and have Last Update times of Apr 22 05:37:37 2024 GMT and Apr 22 05:50:39 2024 GMT, respectively.

What did your "workaround" actually do?

as this week is e-commerce monitoring gmbh's audit week, our compliance resources which include incident reporting, were mainly focused on that. Our DevOps have been working s with high priority on various improvements that address current Mozilla bugs.

Can you share how many people are in the positions of policy authority? How many people have administrative access to your CA systems? Also, are there any overlaps between these numbers?

The timeline is missing critical information. For example:

  • When was the last time you had modified the CRL generating code & why?
  • "Updated database structure", the timeline should include what lead to that

Also, any specific BR/Policy updates that had an impact on this should also be noted here.

Effectively, the point of the timeline and incident report is for someone who is not familiar at all with your internal systems to understand the full story.


I also assume you know that CAs are required to monitor Bugzilla for other CA's incidents. Do you currently do this?

If so, can you please share the triage logs you have for this: https://bugzilla.mozilla.org/show_bug.cgi?id=1855997

If you don't currently monitor Bugzilla, can you explain why not?


Looking at your action items:

  • "Improve CRL pre issuance linting" - How?
  • "Improve Monitoring & Notification" - How?
  • "Deployment Productive Systems DevOps" - I'm not sure what this means.
  • "Bug Fix Resolve, Prevent 2024-05-31" - 2 months deadline since the report of this incident to solve this bug? That's ~17% of a year. It is unacceptable to have a CRL system broken for 17% of a year.
  • "4-eyes Review by Security Staff Prevent 2024-06-04" - Why is this taking so long to do? This is something that should absolutely be happening right now.

These deadlines set way later in the future do not inspire confidence in e-commerce as a CA.

Flags: needinfo?(ca)

Dear All,

as correctly identified, the issue is reoccuring/ongoing.

The workaround described above had to be abandoned. We will explain the context in further communication. For proof that the workaround actually worked, please refer to to the attached folder.

It is also worth repeating that there are several other services that are not affected, including SSL/TLS certification services under the same RootCAs. For reference, just one example for globaltrust-2020-server-ev-1 is attached.

The list of Action Items including their due dates remain as communicated. Any further material question will either be addressed in an appropriate bug or taken into account in further communication here.

Best regards,
Daniel

Flags: needinfo?(ca)

Please provide an updated timeline that includes 1) when the workaround stopped working and 2) when you became aware of the workaround no longer working.

Please keep in mind that https://wiki.mozilla.org/CA/Responding_To_An_Incident requires that questions be answered "promptly." One week is just the upper bound, and it's hard to understand why Comment 11 took a full week to provide. I hope that future responses will be provided more promptly.

Per the timeline, these CRLs are not expected to be fixed until 2024-05-31. globaltrust-2020-server-ov-1.crl is for an intermediate CA that is trusted by Mozilla and Chrome for TLS server authentication, has unexpired TLS server certificates chaining to it, and has continued to issue certificates since this incident began. This is not a minor compliance issue, but a problem that renders these CRLs unusable by consumers which validate the CRL's signature, impairing the effective revocation of certificates issued by these CAs. I'm also setting a needinfo for Ben and Ryan to get input about root programs' expectations.

Flags: needinfo?(ryandickson)
Flags: needinfo?(ca)
Flags: needinfo?(bwilson)

Hi Andrew,

The response(s) from e-commerce monitoring GmbH to numerous incidents over the last year (e.g., https://bugzilla.mozilla.org/show_bug.cgi?id=1893546, https://bugzilla.mozilla.org/show_bug.cgi?id=1883711, https://bugzilla.mozilla.org/show_bug.cgi?id=1862004, https://bugzilla.mozilla.org/show_bug.cgi?id=1830536, https://bugzilla.mozilla.org/show_bug.cgi?id=1815534) have fallen short of the Chrome Root Program’s expectations.

Sustained and willing non-compliance with the TLS BRs and root program policies is unacceptable, and we maintain a variety of tools to help protect Chrome users from certificates that put their safety and privacy at risk.

Enforcement mechanisms enacted by root programs, and the visibility of those actions, might not always be immediate or publicly evident. We appreciate the community’s patience as we continue to work through our incident response process, and we appreciate you (and everyone else involved) continuing to share your viewpoints from the public perspective.

-Ryan

Flags: needinfo?(ryandickson)

Hi Andrew,
Just a quick response here. e-commerce monitoring's response here has fallen short of Mozilla's expectations as well. This incident will be included in a summary of issues being prepared.
Ben

Flags: needinfo?(bwilson)

Could we also get some guidance here from Apple?

I'd also ask Microsoft but I'm not sure what email they use here.

Flags: needinfo?(clintw)

Thanks Amir! We're certainly monitoring these incidents very closely. Our requirements for Incident Reporting match the guidance provided on https://www.ccadb.org/cas/incident-report, with the following added statement in Section 3 of https://www.apple.com/certificateauthority/ca_program.html:

Of paramount importance for CA providers when submitting incident reports and participating in all follow-up discussion are:
* a demonstration of quality in investigation and depth of knowledge in the root cause analysis, including analyzing for variants;
* timeliness and transparency in responding to questions; and
* thoroughness and specificity in the identification and implementation of remediation tasks.

Regarding the incident specifics, I fully concur with Andrew's statement in comment #12 in regards to the impeded functionality of these CRLs. Similarly, I'm struggling to identify evidence that this incident report meets the requirements of the Apple Root Program.

Flags: needinfo?(clintw)

According to the timeline in Comment 5, the bug fix should have been completed by 2024-05-31. However, e-commerce monitoring is still producing broken CRLs (see attached).

Also, more than a month has passed without a response to my request for an updated timeline in Comment 12.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: