Closed Bug 1844514 Opened 1 year ago Closed 1 year ago

MICROSEC: Incident report - No OCSP status response for 2 Precertificates

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: szoke.sandor, Assigned: szoke.sandor)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36

Steps to reproduce:

Microsec OCSP responder failed to send correct answer for two precertificates

MICROSEC INCIDENT REPORT - No OCSP status response for 2 Precertificates


I -- How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

Microsec received an iformation by phone, that 2 Microsec OCSP problems reported on the following site:
https://sslmate.com/labs/ocsp_watch/


II -- A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2023-07-18 19:55 CET
  • receive a notification phone call about the problem
2023-07-18 19:57 CET
  • Microsec opened an internal JIRA ticket to record the problem
2023-07-18 20:11 CET
  • initiating an investigation to identify the cause(s) of the problem and to prevent further similar errors
2023-07-18 20:49 CET
  • information collected about the problematic precertificates
2023-07-18 20:56 CET
  • finding the reason of the problem
2023-07-18 21:00 CET

III -- Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

  • The two problems happened in different time, so they were independent events.
  • The investigation started after office hours, when there is no certificate issuance.
  • The problem was temporarily solved very quickly, so there was no need to stop the certificate issuance.

IV -- A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

2022-12-16
2023-04-14

V -- The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

domain crt.sh link
dtk.kszdr.gov.hu https://crt.sh/?id=8214560966
smtp1.mkb.hu https://crt.sh/?id=9146975721

VI -- Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We performed the initial investigation and we found the following

  • We could find in the CA log entries, that in booth cases an error happened during the certificate issuance:

-- the precertificate was created successfully

-- the precertificate transmitted to at least one log server successfully

-- the CA software could not reach the necessary number of log servers

-- the certificate issuance process was terminated with an error status

-- the TLS certificate was not issued

-- due to the improper error management flow installed in the CA software, the precertificate has not been added to the OCSP responders database.

  • After the unsuccessful issuance, the CA created a new precertificate with the same plublic key and with new serial number, and with that the certificate issuance was successful.

Summary of the findings

The problem was caused by a configuration problem in the CA program

  • the precertificate was not added to the OCSP responders database, when at least one log server could respond with an SCT

VII -- List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Immediate actions

  • Microsec added the two missing precertificates to its OCSP responders database
  • Microsec revoked the two problematic precertificates immediately
  • A quick initial investigation was made to find out the reason of the problem.
  • Microsec identified the causes of the problem as you see it above.
  • Microsec made a quick fix on the CA program, which reduces the chance to have this type of problem again
  • Microsec opened an incident bug in Mozilla's Bugzilla with the present report.

Further planned actions

Deadline: 2023-08-20
  • Microsec will make a more detailed investigation on the CA software and makes further changes if necessary to prevent this problem happening again.
  • Microsec will develop an automatic tool tho check the https://sslmate.com/labs/ocsp_watch/ daily

Assignee: nobody → szoke.sandor
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [ocsp-failure]
Type: defect → task

STATUS REPORT - 2023-08-09

actions

2023-07-18
2023-07-20
  • Microsec finished the deep investigation and was able to find a final solution to eliminate the root cause of this problem.
  • Microsec decided to improve the certificate issuance process. The pre-certificate will be added to Microsec's internal certificate status repository dedicated to OCSP responders immediately after its creation, before it is sent to the CT log servers, so it will be impossible to publish a pre-certificate without OCSP based revocation status information.
2023-07-31
  • the improved CA program was installed in the live environment
  • started live testing of the new solution
2023-08-07
  • Microsec finished the one-week live test, the development was approved
2023-08-09
  • there was no further OCSP problems reported on this site up to now
  • Microsec sends this status report to the opened Mozilla bug

Further planned actions

Deadline: 2023-08-20

STATUS REPORT - 2023-08-22

actions

2023-08-09
2023-08-22
  • there was no further OCSP problems reported on this site up to now
  • Microsec sends this status report to the opened Mozilla bug
  • there are still open issues with the automated testing tool subscription

Further planned actions

Deadline: 2023-08-31
  • Microsec plans to resolve the subscription issues with the automated tool that checks https://sslmate.com/labs/ocsp_watch/ daily and reports any issues via email
  • Microsec also continues with manual checks until the automated tool subscription will be finalized

Thank you for your report and updates!

I have a few things I'd like to get some clarification on if you don't mind:


Was the issue related to how your software was configured, or an issue with the code?

You seem to state:

The problem was caused by a configuration problem in the CA program

but also that you made changes to the program itself.


I understand that your issuance process is currently manual, and there's no automation end-to-end. How do you handle errors that come up in general in various steps of your issuance pipeline?

It seems that you didn't notice the certificate failing to be issued, but you noticed the problem only because of the OCSP server. Now that OCSP works, how will you catch and correct failures in your issuance the next time they happen?

The root cause of appearing on OCSP Watch has been found, but what about the root cause of error handling in the issuance pipeline? It seems like there is some masking of symptoms here but the underlying problem is still out there.

Have you done any reviews or have you investigated your error handling in the pipeline to ensure it exists, it is compliant, and it implements the correct business logic, and alerts when needed?

Flags: needinfo?(szoke.sandor)

Thank you for your feedback and questions, and sorry for our late answer.


Configuration change via CA program change

Originally during the issuance process our CA application

  • created the precertificate
  • tried to send it to 3 different CT log servers
  • in case of successful CT log server registration, created the TLS certificate and added it to our certificate store
  • in case of faults, added the precertificate to our certificate store

During the problematic two issuances, an unexpected type of error occurred which was not managed by the CA application correctly, and it terminated without adding any certificates to our certificate store. Our RA system received an error message and it requested the issuance of the same certificate again. The CA created new precertificates based on the new request and based on the new precertificates it issued the correct TLS certificates to the customer, so the certificate request was solved and the problem remained uncovered.

As a quick fix, we modified the configuration settings and added this possibility to the error management.
A deeper investigation of the problem showed later, that a new unknown problem can happen at any time, so we decided to modify the CA program to solve any possible issues in the future.
The new process is as follows:

  • the precertificate is created
  • the created precertificate is added to our internal certificate store
  • we try to send the precertificate to the required number of CT log servers
  • if the registration process is successful,
    • the CA issues the life certificate with the received SCTs and
    • the precertificate is replaced by the TLS certificate in the certificate store
  • if the registration process is not successful,
    • the issuance process is terminated
    • the precertificate remains in the certificate store.

We think that this solution is not specific only to OCSP responders, and gives a perfect solution for this problem.


Automatic certificate issuance

Our issuance process in the CA application is not manual, the whole process is fully automatic.
The CA receives a signed certificate request from our RA system, and after making several checks it issues the requested certificate without any human interaction.
Human interaction is needed only in the RA part when there is a need for interaction with the customer or with third parties during the validation process.


Conclusion

I hope I could fully answer your questions.


STATUS REPORT - 2023-09-01

Actions

2023-08-22
  • Microsec continues daily manual check of page https://sslmate.com/labs/ocsp_watch/ every business day
  • Microsec continues negotiations regarding the subscription of the web testing service
  • the automated regular check still runs in test mode
2023-09-01
  • there was no further OCSP problems reported on this site up to now
  • there are still open issues with the automated testing tool subscription - no progress
  • Microsec sends this status report to the opened Mozilla bug

Further planned actions

Deadline: 2023-09-15
  • Microsec hopes to resolve the subscription issues with the automated tool that checks https://sslmate.com/labs/ocsp_watch/ daily and reports any issues via email. If it can not be solved by that time, Microsec will discuss the possibility of developing its own test tool or finding another service
  • Microsec also continues with manual checks until the automated tool subscription will be finalized

STATUS REPORT - 2023-09-15

ACTIONS

2023-09-01
  • Microsec continues daily manual check of page https://sslmate.com/labs/ocsp_watch/ every business day
  • Microsec continues negotiations regarding the subscription of the web testing service
  • the automated regular check still runs in test mode
2023-09-12
  • there was no further OCSP problems reported on this site up to now
  • there are still open issues with the automated testing tool subscription - no progress
  • Microsec activated another testing tool to check page https://sslmate.com/labs/ocsp_watch/ automatically every day. In case of any Microsec related issue the program sends a notification email to us.
2023-09-13
  • as a topic on our conformity assessment audit, we discussed this incident with our auditor, they accepted our solution
  • the incident will be listed in our next AAL in 2023Q4
2023-09-15

FURTHER PLANNED ACTIONS

  • Microsec does not plan to take any further action regarding this incident.

OVERALL STATUS

  • [x] Solving the problem with the two reported pre-certificates
  • [x] Reporting the incident to Mozilla
  • [x] Finding the root reason of the problem
  • [x] Improving the Certificate issuance process
  • [x] Setting up automatic testing tool

Flags: needinfo?(szoke.sandor)

Thanks for your update. It appears all planned actions have been implemented. If there are no other questions or suggestions, I anticipate closing this on Friday, 29-Sept-2023.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Depends on: 1884702
No longer depends on: 1884702
You need to log in before you can comment on or make changes to this bug.