User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
Steps to reproduce:
- How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
We noticed the issue during an escape analysis after deploying a SEV 1 store-front fix unrelated to validation. The issue was missed originally during testing but the patch applied to a store-front caused issuance to skip the new domain validation system if the cert was never-before seen and the cert was org validated. We originally though the issue related to how domain validation evidence was stored but during investigation realized that the storefront skipped domain validation. This led to mis-issuance of 123 OV certs and 36 EV certs. We have been monitoring certificate issuance for problems like this since we deployed the domain consolidation, which is why we caught it during the escape analysis.
- A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
2019-11-04 – A SEV1 outage was reported for a storefront. The fix was deployed after hours with targeted testing instead of regression testing. This is standard for our Sev1 issues. Unfortunately, the patch for the SEV1 caused an issue where the storefront sent the certificate information to issuance with the evidence of domain validation rather than to validation.
2019-11-07 – The problem was discovered during an escape analysis. From the initial investigation, it looked like the validation evidence storage was at issue. We rolled back the patch while investigating further.
2019-11-08– We realized the issue was with domain validation but were not sure of the impact. We continued to investigate the certificates impacted and conditions for missing validation.
2019-11-11 – A final list of impacted certificates was reported, and an incident report was written.
2019-11-12 – All impacted certificates were revoked within 24 hours of knowing which certificates were impacted.
- Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
A deployment on November 7th, as detailed above, reverted the patch and removed the bad code.
- A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
123 OV and 36 EV certs. I’m working on getting crt.sh links and will post them as an attachment.
- The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
- Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
We had a SEV1 issue that was escalated. The team fixed the issue after hours. We performed targeted testing but no regression testing on how the change would impact other systems. Unfortunately, the system impact was that the storefront started providing certificate requests for issuance, skipping the validation system. Despite having good unit tests on applications, we lack good cross-system automated tests, mostly because of the number of storefronts.
- List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
The immediate fix is to add a report to our canary platform that will identify issues in this integration point on an ongoing basis. This will provide alerts in an out of band process, while further system consolidations are performed that will provide even better testing around these integration points. The addition to our canary platform will be in place by 2019-11-16.
In addition, we need to provide better automated system tests. These are more complicated because of the number of storefronts, but we plan to work on them more in parallel with the system shut down.