SecureTrust: Incorrect OCSP response
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: u671069, Assigned: u671069)
Details
(Whiteboard: [ca-compliance] [ocsp-failure])
-
How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
On April 6, 2022 - 21:00 CDT an intermittent connection issue began to be reported in a log related to a database used to transfer, track, and provide statistics on certificate OCSP responses generated from the CA, cataloged, and transferred to the OCSP end point servers. The OCSP responses were being refreshed by the CA, but a few responses were getting dropped during the propagation to the OCSP service endpoints due to a database service restart caused by the corrupted database statistics table.
-
A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
March 21, 2022 – SecureTrust CA migration of some customer-facing services to new servers during the purchase transition
April 6, 2022 - 21:00 CDT – Intermittent connection issues were seen in a log related to a OCSP transfer database
April 9, 2022 08:00 CDT – Dumped the corrupted database statistics table and work began to get it restored to another database server
April 13, 2022 17:00 CDT – Started seeing an external impact on OCSP responses related to the intermittent connection issues
April 13, 2022 18:30 CDT – Placed the new database server online with the corrected database and began the process of a forced OCSP refresh -
Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
N/A - No problem with issued certificates
-
A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
N/A - No problem with issued certificates
-
The complete certificate data for the problematic certificates.
N/A - No problem with issued certificates
-
Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
We have a database that tracks the transfer of OCSP responses between the CA and the OCSP service endpoints. This database also tracks statistics about requests for OCSP responses from the end points. The statistics table in the database got corrupted due to our migration of servers during the purchase transition. This database corruption caused a service to start intermittently failing which in turn caused the loss of an occasional OCSP refresh response from the CA to the OCSP end point. Our system was recovering from this loss without customer impact since our OCSP responses are set to refresh at no more than one-half the lifetime of the OCSP response and typically refresh at one-quarter of that lifetime. On April 13, this loss became externally visible since more refreshed responses were being lost and they were not recovering from the four-times refresh cycle before expiration.
Our OCSP monitoring included jobs to monitor OCSP response endpoint health, refresh rates, and validity to provide early detection of any possible issue before it would have been seen by our customers. This monitoring did not include the entire valid certificate population and therefore was not helpful in early detection of this intermittent problem.
-
List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
While we were monitoring OCSP response endpoint health, going forward, we will be monitoring our entire corpus of valid certificates, and will flag all responses older than half their lifetime. This will alert us sooner should another intermittent problem occur so we can address the problem before we have any external OCSP response refresh issues.
Updated•3 years ago
|
Comment 2•3 years ago
|
||
Thanks for filing this incident report, but it's unclear from Comment 1 what the impact of this issue was. What does it mean for a response to be "dropped"? Did that lead to an expired OCSP response being served, a response with an error, or something else?
OCSP Watch is currently tracking 13 certificates issued by SecureTrust on March 22 for which the OCSP responder is returning "unauthorized". Is that related to this incident?
The dropped OCSP responses would be reflected externally based on the scenario:
- If the dropped OCSP response was the first OCSP response for the issued certificate, it was serving an “unauthorized” response until a refresh response was received. After the refreshed response was received, it would function properly.
- If the dropped OCSP response was a refreshed response and the prior recorded OCSP response was not expired, the OCSP response would function properly.
- If the dropped OCSP response was a refreshed response and the prior recorded OCSP response was expired, it was serving an expired OCSP response until a new refreshed response was received. After the refreshed response was received, it would function properly.
Thank you for creating the OCSP Watch tracking tool and we will add the discussion of these 13 to this Bugzilla as the purchase transition impacted them as well.
In this case the 13 precertificates with OCSP responses of “unauthorized” are for precertificates that were never issued as final certificates in an isolated edge case. This edge case was due to a new firewall being added to the configuration that was restricted and dropped requests to some of the required CT logs which prevented final issuance. The firewall setting was adjusted to allow logging to all the CT logs, and we are working on a resolution for these precertificates.
We have discussed implementing a short-term solution for these 13 isolated edge-case precertificates that were never issued as final certificates. However, given that we are currently focusing on designing and implementing a robust long-term solution for item 8 of Mozilla’s Root Store Policy v2.8, we decided it would be more prudent to focus on that solution, and address these 13 precertificates then.
Since the due date for completion of this Mozilla Root Store Policy item is October 1, 2022, can you please set the Next Update to September 1, 2022.
Updated•3 years ago
|
We are still on track for completion on release date 9/28. If you could please set that as the Next Update date.
Updated•3 years ago
|
Update: As part of our release, we added the ability to provide CRL and OCSP responses in the case where a precertificate was generated, but the final certificate was never issued.
The 13 precertificates without a corresponding issued certificate which were serving an OCSP response of unauthorized have been revoked:
https://crt.sh/?id=6393571123
https://crt.sh/?id=6393460220
https://crt.sh/?id=6393460223
https://crt.sh/?id=6393427608
https://crt.sh/?id=6393427602
https://crt.sh/?id=6393427620
https://crt.sh/?id=6393384069
https://crt.sh/?id=6393384043
https://crt.sh/?id=6393384064
https://crt.sh/?id=6393384050
https://crt.sh/?id=6393384122
https://crt.sh/?id=6393384070
https://crt.sh/?id=6393383978
Comment 9•3 years ago
|
||
Thanks. Do you have any other remediation tasks planned? If not, then I am inclined to close this bug sometime this week. Also, should we start referring to this and other pending incidents under a CA operator name of Viking Cloud?
Assignee | ||
Comment 10•3 years ago
|
||
Yes, unless there are additional comments, please close as there are no other remediation tasks for this incident. As our rebranding is still in process any future incidents will be marked as VikingCloud.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Description
•