FNMT: Incorrect publication of information for Test Website - Valid
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: amaya.espinosa, Assigned: amaya.espinosa)
Details
(Whiteboard: [ca-compliance] [policy-failure])
Preliminary Incident Report
The FNMT has received a notification regarding the incorrect publication of information for Test Website -Valid. The certificate that appears as “Test Website -Valid” value in the CCADB for AC RAIZ FNMT-RCM SERVIDORES SEGUROS appears as an expired certificate.
We have diagnosed and solved the problem and a full report with the findings and corrective actions implemented will be issued in the next few days.
Updated•1 year ago
|
Comment 1•1 year ago
|
||
Automated monitoring as suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=1925239#c2 can permanently stop this type of problems.
| Assignee | ||
Comment 2•1 year ago
|
||
Thank you for your suggestion.
We agree that implementing monitoring can effectively prevent these types of issues in the future. We have reviewed and enhanced our monitoring systems to ensure better coverage.
| Assignee | ||
Comment 3•1 year ago
|
||
Incident Report
Summary
The FNMT has received a notification regarding the incorrect publication of information for the 'Test Website - Valid.' After reviewing it, the 'Test Websites - Valid' URLs [https://testactivetipo1.cert.fnmt.es] and [https://testactivetipo2.cert.fnmt.es] are protected by expired certificates. This constitutes a non-compliance breach as required CA/Browser Forum TLS BR, 2.2 Publication of information.
Impact
Only two certificates were affected: Extended Validation (EV) and Organization Validation (OV) test website certificates. Fortunately, since the incident had no impact on pki infrastructure, there was no need to stop issuance. The affected certificates are listed in the Appendix.
Timeline
All times are UTC.
2025-01-15:
- 07:16: New valid test certificates were issued due to the expiration of test certificates on January 26.
- 07:43: Valid test certificates were updated on the URLs testactivetipo1.cert.fnmt.es and testactivetipo2.cert.fnmt.es, but only on the active web node.
2025-02-04:
- 17:17: Due to a bug, the TLS termination web cluster balanced to a passive node that was not updated. As a result, active certificates became expired.
2025-02-07:
- 17:25: Compliance staff processed an email received through the incident mailbox to report an error in the URL testactivetipo1.cert.fnmt.es, the valid test website. Web certificate appeared to be expired.
- 17:35: Compliance staff reviewed test websites to confirm non-compliance. They discovered that the URL testactivetipo2.cert.fnmt.es was also affected.
- 17:40: Compliance staff alerted the technical area.
- 18:10: Technical staff checked the infrastructure and updated the non-compliant node.
- 18:36: The cluster was rebalanced.
2025-02-08:
- 18:17: A reply was sent to the email received the day before.
2025-02-10:
- 13:40: A monitoring system was deployed to detect future issues.
- 15:56: A preliminary incident report was posted on Bugzilla.
Root Cause Analysis
Technical staff were using a renewal procedure to update certificates, so we were confident that the change was doing well. However, after the incident, when we revised the procedure, we realized that several steps needed improvement. Normally, the renewal would not have been a problem, as the certificates were installed on the active node. The procedure should ensure that certificates are updated on both nodes in order to prevent failures.
Furthermore, although the website certificates were issued in time, we cannot rely solely on human technical expertise. There are already several websites monitored by an external probe. However, we need to check that any relevant websites are monitorized. It would have been necessary to deploy a monitoring system to detect non-compliance on the test websites.
To summarize, we detected two main root causes:
- The procedure did not cover all the steps for certificate renewal in the infraestructure.
- A lack of monitoring system on the test websites.
Lessons Learned
What went well
- There are no other test sites impacted.
- New certificates were previously issued, so we could update websites certificates easily.
- Good communication between the compliance team and technical staff.
- Dedication of technical personnel to resolve the issue as quickly as possible.
What didn't go well
- Lack of monitoring system on test sites.
Where we got lucky
- Being notified of the issue in just three days from the non-compliance.
Action Items
| Action Item | Kind | Due Date |
|---|---|---|
| Implement test site monitoring systems to detect possible future problems | Detect | Done |
| Revise the procedure for certificate renewal in the infraestructure to include all steps | Prevent | Done |
Appendix
These are the two test web sites affected by the certificate expiration:
| Root CA | Type | URL |
|---|---|---|
| AC Servidores Seguros | EV | https://testactivetipo1.cert.fnmt.es |
| AC Servidores Seguros | OV | https://testactivetipo2.cert.fnmt.es |
Details of affected certificates
https://crt.sh/?sha256=[8EF3F2979C667612A93E0D90FC6ADD31E9EE3FCA7E7FEACACF1F1F42CF3FEB50]
https://crt.sh/?sha256=[3C9A084FFD7BF0C27D9A6AA19BBAEB848773210BD328A8C7FBA9360E191A5A0B]
| Assignee | ||
Comment 4•1 year ago
|
||
Sorry, I have updated the details of affected certificates. I am fixing a typo in the URL of the certificates by removing the [.
Details of affected certificates
https://crt.sh/?sha256=8EF3F2979C667612A93E0D90FC6ADD31E9EE3FCA7E7FEACACF1F1F42CF3FEB50
https://crt.sh/?sha256=3C9A084FFD7BF0C27D9A6AA19BBAEB848773210BD328A8C7FBA9360E191A5A0B
| Assignee | ||
Comment 5•1 year ago
|
||
Report Closure Summary
-
Incident description: Valid test websites https://testactivetipo1.cert.fnmt.es and https://testactivetipo2.cert.fnmt.es were protected by two expired certifications, resulting in a non-compliance violation as required CA/Browser Forum TLS BR, 2.2 Publication of information.
Although valid test certificates were updated in time, the update was only performed on the active node of the web cluster. Due to a bug, the cluster was balanced to a passive node and, as a result, active certificates became expired. -
Incident Root Cause(s): The root causes of this incident were:
- The procedure for certificate renewal did not cover all necessary steps: Using an incomplete procedure led to unexpected mistakes.
- Inadequate monitoring and detection: A lack of monitoring system on test sites made it difficult for technical staff to identify potential issues proactively.
-
Remediation description: The following actions were taken to address the incident:
- Updated the non-compliant node and rebalanced the cluster.
- Implemented a monitoring system to detect non-compliance on test websites.
- Revised the procedure for certificate renewal in the infrastructure to include all necessary steps.
-
Commitment summary: The FNMT commits to:
- Review and improve our procedures to ensure they cover all necessary steps.
- Enhance monitoring capabilities to prevent similar incidents from occurring in the future.
All Action Items disclosed in this report have been completed as described, and we request its closure.
Comment 6•1 year ago
|
||
I intend to close this on Friday, 28-Feb-2025, unless there are issues or questions to discuss.
Updated•1 year ago
|
Description
•