Closed Bug 1874196 Opened 2 years ago Closed 1 year ago

SwissSign: difference in upper and lower case between CN field and SAN

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sandy.balzer, Assigned: sandy.balzer)

Details

(Whiteboard: [ca-compliance] [dv-misissuance] [ov-misissuance])

Attachments

(1 file)

Incident Report

Summary

SwissSign received an e-mail from Ben Wilson bringing a problem to our attention with a list of certificates.
Our analysis shows that these certificates have an difference in upper and lower case between CN field and SAN (Subject Alternative Name).
This violates the Baseline Regulation chapter 7.1.4.3 Subscriber Certificate common Name Attribute:
"If the value is a Fully‐Qualified Domain Name or Wildcard Domain Name, then the value
MUST be encoded as a character‐for‐character copy of the dNSName entry value from the
subjectAltName extension. "

Further root cause analysis is ongoing.

Impact

We assign a low risk to the ecosystem by these mis-issued certificates because all certificates on the list from Ben are already revoked. We can confirm that the system revoked these certificates within seconds because the defined post checks detected a mismatch.

Further investigation is ongoing (the cert list in the attachment is not final yet and contains only the confirmed facts by ongoing analysis)

Timeline

2023-01-10:

  • 22:51 UTC send e-mail form Ben Wilson

2023-01-11:

  • 07:44 UTC employees of SwissSign processed above e-mail from Ben Wilson.
  • 10:00 UTC Compliance confirms mis-issuance and starts mis-issuance process
  • 13:30 UTC Information to audit body
  • 15:25 UTC Posting of this Bugzilla

Root Cause Analysis

The root cause analysis is ongoing.

Lessons Learned

What went well

The following is what we can confirm now.

  • Certificates where automatically revoked by the systems Post checks

What didn't go well

  • Will be filled in after the final root cause analysis.

Where we got lucky

  • Will be filled in after the final root cause analysis.

Action Items

Action Item Kind Due Date
1. Looking for other affected certificates Detect 2024-01-12
2. Root cause analysis Detect latest 2024-01-15

Appendix

Details of affected certificates

In the attachement is the list of the 44 certificates of the e-mail.

Update

Impact

Our internal investigation shows, no further certificates were impacted.

Timeline

2023-01-10:

  • 22:51 UTC send e-mail form Ben Wilson

2023-01-11:

  • 07:44 UTC employees of SwissSign processed above e-mail from Ben Wilson.
  • 10:00 UTC Compliance confirms mis-issuance and starts mis-issuance process
  • 13:30 UTC Information to audit body
  • 15:25 UTC Posting of this Bugzilla

2023-01-16

  • 13:55 UTC Update Bugzilla with results from the ongoing investigation

Root Cause Analysis

Preliminary Findings:
The design of the Workflow is correct.
Before issuance there is a pre-linting, (cablint, x509lint, zlint) and if there is a negative result the issuance is stopped. If not the certificate is issued and a post-linting is started. If the post-linting has a negative result, the certificate is immediately revoked.
This is the reason why the 44 listed certificates were revoked within a few seconds.

What hampers the workflow from correct execution is part of the ongoing investigation.

Lessons Learned

What went well

The following is what we can confirm now.

  • Certificates where automatically revoked by the systems post checks
  • the issuance workflow is correctly designed

What didn't go well

  • Will be filled in after the final root cause analysis.

Where we got lucky

  • Will be filled in after the final root cause analysis.

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect latest 2024-01-23 ongoing

Update

Impact

Our internal investigation shows, no further certificates will be impacted.

Timeline

2023-01-10:

  • 22:51 UTC send e-mail form Ben Wilson

2023-01-11:

  • 07:44 UTC employees of SwissSign processed above e-mail from Ben Wilson.
  • 10:00 UTC Compliance confirms mis-issuance and starts mis-issuance process
  • 13:30 UTC Information to audit body
  • 15:25 UTC Posting of this Bugzilla

2023-01-16

  • 14:00 UTC Update Bugzilla with results from the ongoing investigation

2023-01-16

  • 14:55 UTC Update Bugzilla with further results from ongoing investigation

Root Cause Analysis

Preliminary Findings:
The design of the Workflow is correct.
Before issuance there is a pre-linting, (cablint, x509lint, zlint) and if there is a negative result the issuance is stopped. If not, the certificate is issued and a post-linting is started. If the post-linting has a negative result, the certificate is immediately revoked.
This is the reason why the 44 listed certificates were revoked within a few seconds.

What hampers the workflow from correct execution is part of the ongoing investigation.

Finding #1
The issuing process did not stop after the failed pre-linting. The implemented issuance workflow should have stopped this. The reasons why this failed are still under investigation.

What we know is that all our implemented test cases abort the process as it should happen.

Finding #2
The mismatch of SAN and CN has happened because of a missing character check between the two fields. This issue has been addressed on 20 September 2023. This bugfix was implemented because of improvements to the customer interface.

Because of this and the trust in the design of the issuance workflow we did not look for any mis-issued certificates. Nevertheless, because of this we know that more mis-issuances (based on check on mismatching SAN/CN) are not possible since 20 September 2023.

Lessons Learned

What went well

The following is what we can confirm now.

  • Certificates where automatically revoked by the systems post checks
  • the issuance workflow is correctly designed

What didn't go well

  • after post-linting revoked the pre-cert and cert there was no trigger implemented to start the internal mis-issuance process

Where we got lucky

  • because of a bug fix which addressed another issue, a mis-issuance is not possible anymore (see finding 2)

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect latest 2024-01-23 ongoing
3. Fixing conversion big letters to small letters Detect latest 2023-09-20 Done
4. When post-linting fails the internal mis-issued certificate process is triggered automatically Mitigate latest 2024-03-31 ongoing
Assignee: nobody → sandy.balzer
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

Update

Impact

Our internal investigation shows, no further certificates will be impacted.

Timeline

2023-01-10:

  • 22:51 UTC send e-mail form Ben Wilson

2023-01-11:

  • 07:44 UTC employees of SwissSign processed above e-mail from Ben Wilson.
  • 10:00 UTC Compliance confirms mis-issuance and starts mis-issuance process
  • 13:30 UTC Information to audit body
  • 15:25 UTC Posting of this Bugzilla

2023-01-16

  • 14:00 UTC Update Bugzilla with results from the ongoing investigation

2023-01-24

  • 20:05 UTC Update Bugzilla with RCA ongoing status

Root Cause Analysis

Preliminary Findings:
The design of the Workflow is correct.
Before issuance there is a pre-linting, (cablint, x509lint, zlint) and if there is a negative result the issuance is stopped. If not, the certificate is issued and a post-linting is started. If the post-linting has a negative result, the certificate is immediately revoked.
This is the reason why the 44 listed certificates were revoked within a few seconds.

What hampers the workflow from correct execution is part of the ongoing investigation.

Bug in the software
Finding #1
The issuing process did not stop after the failed pre-linting. The implemented issuance workflow should have stopped this. The reasons why this failed are still under investigation.

What we know is that all our implemented test cases abort the process as it should happen.

Finding #2
The mismatch of SAN and CN has happened because of a missing character check between the two fields. This issue has been addressed on 20 September 2023. This bugfix was implemented because of improvements to the customer interface.

Because of this and the trust in the design of the issuance workflow we did not look for any mis-issued certificates. Nevertheless, because of this we know that more mis-issuances (based on check on mismatching SAN/CN) are not possible since 20 September 2023.

RCA is in ongoing status.

Lessons Learned

What went well

The following is what we can confirm now.

  • Certificates where automatically revoked by the systems post checks
  • the issuance workflow is correctly designed

What didn't go well

  • after post-linting revoked the pre-cert and cert there was no trigger implemented to start the internal mis-issuance process

Where we got lucky

  • because of a bug fix which addressed another issue a mis-issuance is not possible anymore (see finding 2)

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect next 2024-01-31 ongoing
3. Fixing conversion big letters to small letters Detect latest 2023-09-20 Done
4. When post-linting fails the internal mis-issued certificate process is triggered automatically Mitigate latest 2024-03-31 ongoing

Timeline

2023-01-10:

  • 22:51 UTC send e-mail form Ben Wilson

2023-01-11:

  • 07:44 UTC employees of SwissSign processed above e-mail from Ben Wilson.
  • 10:00 UTC Compliance confirms mis-issuance and starts mis-issuance process
  • 13:30 UTC Information to audit body
  • 15:25 UTC Posting of this Bugzilla

2023-01-16

  • 14:00 UTC Update Bugzilla with results from the ongoing investigation

2023-01-24

  • 20:00 UTC Update Bugzilla with RCA ongoing status

2023-01-30

  • 16:45 Update of Bugzilla

Root Cause Analysis

RCA Finding
One of the API Interfaces was changed due to customer complaints of lag and this created a Bug in the API
That made the pre linting tasks asynchronous and therefore issuance on that API was not stopped.

We got a time line for the bugfix to be delivered by beginning of next week and plan to start testing and implementation phases soon as we have received the fix.

Finding #1
The issuing process did not stop after the failed pre-linting. The implemented issuance workflow should have stopped this. The reasons why this failed are still under investigation.
We have analyzed that this behaviour is only on an specific API and was caused by a bug that made pre linting asynchronus and prevented issuance stop.
We got a time line for the bugfix that will switch back to sequential execution to prevent this error. It will be delivered by beginning of next week and we plan to start testing and implementation phases as soon as we have received the fix.

Finding #2
The mismatch of SAN and CN has happened because of a missing character check between the two fields. This issue has been addressed on 20 September 2023. This bugfix was implemented because of improvements to the customer interface.

Because of this and the trust in the design of the issuance workflow we did not look for any mis-issued certificates. Nevertheless, because of this we know that more mis-issuances (based on check on mismatching SAN/CN) are not possible since 20 September 2023.

RCA is now done and we await fix delivery.

Lessons Learned

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect next 2024-01-31 Done
3. Fixing conversion big letters to small letters Detect latest 2023-09-20 Done
4. When post-linting fails the internal mis-issued certificate process is triggered automatically Mitigate latest 2024-03-31 ongoing
5. Implement fix for bug Mitigate latest 2024-02-09 ongoing
Whiteboard: [ca-compliance] → [ca-compliance] [dv-misissuance] [ov-misissuance]

Two quick clarifications:

  1. Can you provide more details about the software which does pre-issuance linting? For example, if it is third-party software, what are SwissSign's procedures and controls around deploying an updated version of that software? If it is first-party software, why was it not simply rolled back to a previous version? In either case, why was the bug not detected for over 3 months (since September 2023)?
  2. Am I correct in understanding that SwissSign is still running without any pre-issuance linting, because the bugfix to make that process synchronous is still not deployed?

(In reply to Aaron Gable from comment #5)

Two quick clarifications:

  1. Can you provide more details about the software which does pre-issuance linting? For example, if it is third-party software, what are SwissSign's procedures and controls around deploying an updated version of that software? If it is first-party software, why was it not simply rolled back to a previous version? In either case, why was the bug not detected for over 3 months (since September 2023)?
  2. Am I correct in understanding that SwissSign is still running without any pre-issuance linting, because the bugfix to make that process synchronous is still not deployed?

Hi Aaron, thank you for your questions, please find our answers below:

For question 1:
There are 3 linters used for TLS certificates.
This are the pre-issuance linters:
Certlint https://github.com/globalsign/certlint
X509lint https://github.com/kroeckx/x509lint
Zlint https://github.com/zmap/zlint
Yes, these are the standard third-party linters. The issue is our CA system is starting the issuance process before the process is stopped based on the negative linter results, when ordering over the interface mentioned in an earlier post. This asynchronous behavior is the reason for the bug.
This is why we deem the linters as working and are solving the bug in our CA software.

Concerning your question about update procedures and controls: We have standardized procedure and controls in place as part of our change management workflow.
New versions of the linters are always bundled with new versions of the software and we always run all our tests. This includes multiple test (standard as well as additional ad-hoc) as well as regression tests.
We have now added specific test cases for linters results in our test workflow.

The discovery was tricky because we looked at the lower/upper case situation which was already fixed with the patch from September (see above) and we had no mis-issuances since that time.

To control this situation, in future we are implementing email alert (and stop of issuance) of pre- and post-linter fails as mitigation.

On question 2:
We are running pre- and post-issuance linting on all TLS certificates. Additionally, there is the fix from September 2023 that prevents the upper/lower case error. We checked that the post linter did not give a negative result since then.
Additionally, we will actively monitor if the post linter is triggered (see above).
We are waiting for the fix to have our CA software behave synchronous as expected again.

Today we want to give an update about the Bug fix that should be implemented by 9. February 2024.
The expected bug-fix did not pass our quality process. Therefore we had to ask the third-party for a new version and expect the internal quality process to be done by the end of February.
We will give an exact update by 21. February 2024.

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect next 2024-01-31 Done
3. Fixing conversion big letters to small letters Detect latest 2023-09-20 Done
4. When post-linting fails alarming is triggered and the internal mis-issued certificate process is triggered automatically Mitigate latest 2024-03-31 ongoing
5. Implement fix for bug Mitigate testing end of February (Update by 2024-02-21) ongoing

Today we want to give an update about the Bug fix that passed our quality process. We have then implemented the fix in our production enviroment yesterday and since then consider the issue fixed.

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect next 2024-01-31 Done
3. Fixing conversion big letters to small letters Detect latest 2023-09-20 Done
4. When post-linting fails alarming is triggered and the internal mis-issued certificate process is triggered automatically Mitigate latest 2024-03-31 ongoing
5. Implement fix for bug Mitigate implemented 2024-02-20 Done

No Updates.

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect next 2024-01-31 Done
3. Fixing conversion big letters to small letters Detect latest 2023-09-20 Done
4. When post-linting fails alarming is triggered and the internal mis-issued certificate process is triggered automatically Mitigate latest 2024-03-31 ongoing
5. Implement fix for bug Mitigate implemented 2024-02-20 Done

No Updates.

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect next 2024-01-31 Done
3. Fixing conversion big letters to small letters Detect latest 2023-09-20 Done
4. When post-linting fails alarming is triggered and the internal mis-issued certificate process is triggered automatically Mitigate latest 2024-03-31 ongoing
5. Implement fix for bug Mitigate implemented 2024-02-20 Done

(In reply to Sandy Balzer from comment #0)

Impact

We assign a low risk to the ecosystem by these mis-issued certificates because all certificates on the list from Ben are already revoked. We can confirm that the system revoked these certificates within seconds because the defined post checks detected a mismatch.

Further investigation is ongoing (the cert list in the attachment is not final yet and contains only the confirmed facts by ongoing analysis)

Timeline

2023-01-10:

  • 22:51 UTC send e-mail form Ben Wilson

2023-01-11:

  • 07:44 UTC employees of SwissSign processed above e-mail from Ben Wilson.
  • 10:00 UTC Compliance confirms mis-issuance and starts mis-issuance process
  • 13:30 UTC Information to audit body
  • 15:25 UTC Posting of this Bugzilla

Are you able to provide an updated an corrected timeline? Those dates are incorrect right? This bug was opened in 2024. The timeline should include all relevant past events that lead up to the incident and when it started such as when the certificates were issued, when software was updated, when the fix for the character check was deployed, etc.

The impact section should include details about what was impacted. Since this was certificate miss-issuance, I would expect at lease a description of how many certificates were miss-issued and when the dates of the first and last miss-issuance.

(In reply to Mathew Hodson from comment #11)

(In reply to Sandy Balzer from comment #0)

Impact

We assign a low risk to the ecosystem by these mis-issued certificates because all certificates on the list from Ben are already revoked. We can confirm that the system revoked these certificates within seconds because the defined post checks detected a mismatch.

Further investigation is ongoing (the cert list in the attachment is not final yet and contains only the confirmed facts by ongoing analysis)

Timeline

2023-01-10:

  • 22:51 UTC send e-mail form Ben Wilson

2023-01-11:

  • 07:44 UTC employees of SwissSign processed above e-mail from Ben Wilson.
  • 10:00 UTC Compliance confirms mis-issuance and starts mis-issuance process
  • 13:30 UTC Information to audit body
  • 15:25 UTC Posting of this Bugzilla

Are you able to provide an updated an corrected timeline? Those dates are incorrect right? This bug was opened in 2024. The timeline should include all relevant past events that lead up to the incident and when it started such as when the certificates were issued, when software was updated, when the fix for the character check was deployed, etc.

The impact section should include details about what was impacted. Since this was certificate miss-issuance, I would expect at lease a description of how many certificates were miss-issued and when the dates of the first and last miss-issuance.

Hi Mathew, thank you for your questions, please find our answers below.

Yes correct, there are errors in the dating of the timeline. We appreciate your input to improve this Bugzilla, here is the corrected timeline:

Timeline

2023-05-08
• Implementation of the Update that introduced the bug
2023-06-09:
• 13:16 UTC first miss-issued certfcate
2023-09-15
• 12:30 UTC last miss-issued certificate
2023-09-20
• Fixing conversion big letters to small letters in the interface
2024-01-10:
• 22:51 UTC send e-mail form Ben Wilson

2024-01-11:
• 07:44 UTC employees of SwissSign processed above e-mail from Ben Wilson.
• 10:00 UTC Compliance confirms mis-issuance and starts mis-issuance process
• 13:30 UTC Information to audit body
• 15:25 UTC Posting of this Bugzilla

2024-02-20
• Implement fix for bug

As for the impact section, in the first post we wrote that there are 44 certificates, they were revoked within a few seconds. We also added the list of the certificates as an attachment. In the comment 1 we wrote "our internal investigation shows, no further certificates were impacted." So the total of certificates didn't change since the opening of this Bugzilla. Dates have been added to the timeline for better overview.

No Updates.

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect next 2024-01-31 Done
3. Fixing conversion big letters to small letters Detect 2023-09-20 Done
4. When post-linting fails alarming is triggered and the internal mis-issued certificate process is triggered automatically Mitigate latest 2024-03-31 ongoing
5. Implement fix for bug Mitigate implemented 2024-02-20 Done

Update: The automatic alarm that triggers the internal certificate mis-issuance process was successfully implemented and tested on March 19.

Action Items

Action Item Kind Due Date Status
1. Looking for other affected certificates Detect 2024-01-12 Done
2. Root cause analysis Detect next 2024-01-31 Done
3. Fixing conversion big letters to small letters Detect 2023-09-20 Done
4. When post-linting fails alarming is triggered and the internal mis-issued certificate process is triggered automatically Mitigate 2024-03-21 Done
5. Implement fix for bug Mitigate implemented 2024-02-20 Done

With this, all open Action Items are done.
Unless there are further questions, we would request this Bugzilla to be closed.

I'll close this next Wednesday, 27-Mar-2024, unless there are more items to discuss.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: