Telia: Certificates with RSA keys where modulus is not divisible by 8
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: pekka.lahtiharju, Assigned: pekka.lahtiharju)
Details
(Whiteboard: [ca-compliance] [ov-misissuance] [dv-misissuance])
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763
Steps to reproduce:
Request for certificate with RSA keys where modulus is not divisible by 8
Actual results:
Certificate was issued
Expected results:
Request should have been rejected
Assignee | ||
Comment 1•4 years ago
|
||
- How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
We have a regular zlint script running every night to verify that there aren't any issues on our SSL certificates. It reported a "not-divisible-by-8" error 2020-10-19 just before midnight. Next morning on 2020-10-20 the alarm was handled by our production team and response actions were started on the same day after the first occasion.
- A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
Timezone is EEST below:
2020-10-19 12:35 Customer created the first problematic certificate
2020-10-19 23:58 Automatic zlint script generated the first email alarm about the issue.
2020-10-20 7:46 Our production specialist forwarded the first issue to our PKI team
2020-10-20 11:42 Customer created the second problematic certificate
2020-10-20 15:21 Initial analysis by PKI team was ready. This was a real issue. Revocation is required and we have to find the reasons behind this to prevent this and verify that there aren't similar problems elsewhere.
2020-10-20 afternoon Customer was contacted to agree about revocation and to instruct to avoid further invalid keys
2020-10-20 23:58 Automatic zlint script generated the second email alarm about the issue
2020-10-21 7-8 Our production specialist found the second alarm
2020-10-21 9-10 Both invalid certificates were revoked
2020-10-22 Full scan to our database to verify that there aren't any older certificates having this issue. None was found.
ongoing: solution to prevent this problem permanently (check step 7)
- Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
The work to prevent this permanently is still ongoing. Until that we have to rely on the nightly zlint checks and revoke all invalid certificates asap on the next day.
- In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.
Total 2 invalid certificates so far. First issue happened 2020-10-19 12:35 and the second 2020-10-20 11:42.
- In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.
https://crt.sh/?id=3530121566
https://crt.sh/?id=3534830761
- Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
We had notified that this issue was published in Mozilla CA Policy and lately also in BR but we had incorrectly assumed that key quality checking of our CA software called "nist-public-key-validation" would prevent these. The fact that our zlint scans were never found this problem supported the incorrect assumption. None of the requests put to our system this far were using this kind of invalid key. We would have found those because we have been using regular zlint verification for years and we have regularly zlint-verified all older certificates from our database after zlint updates.
- List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
We immediately initiated after the first discovery:
a) Change request to our self-made verification system to start preventing all not-divisible-by-8 issues. We estimate this to be ready before 2020-11-13.
b) We put question to our CA vendor if they could prevent this. No answer yet.
c) We instructed the affected customer about the key generation rules so that at least they won't create the problem anymore
d) We revoked the invalid certificates swiftly and are prepared to revoke if new ones occur before the fix (a)
e) We did a new full zlint scan to all our certificates to find this or any similar problem. No problems were found.
f) We updated BR-self-assessment 2020-10-15 to verify that all BR/Mozilla rules are followed: https://support.trust.telia.com/download/CA/Telia_CA_BR_Self_Assessment.pdf
g) We read carefully other bug reports related to the same issue to find good hints around this issue
h) We added not-divisible-by-8 test to our test program
i) We have a new person since August 2020 concentrating on BR and policy issues. Our capability to follow bug discussions is now much better. We'll join to CABR Forum soon to discuss certificate issues.
Comment 2•4 years ago
|
||
If this issue was discovered and acted upon by the PKI team on the 20th then why did it take Telia 11 days to report this here? Even when both certificates were revoked it still took Telia 10 days to report it here.
It also seems worrying that such an assumption was made when there have been previous bugs reported here with the same issue, bug 1651132, bug 1653475 and bug 1653504. Does Telia regularly check incident reports posted here by other CAs?
Updated•4 years ago
|
Assignee | ||
Comment 3•4 years ago
|
||
We did the decision to report this issue here already 2020-10-20 and the task was appointed to me. I simply concentrated too much on recovering actions listed in step 7. And I was hoping the software fix to be ready before this report but when it delayed because of resourcing issues I finally wrote this report. We'll be faster next time with the report.
Telia is checking incident reports by others but sometimes it is hard to know if the issue could be valid for Telia also. Also we have allocated a new resource to this task and he will greatly improve our quality in this area from now on.
Assignee | ||
Comment 4•4 years ago
|
||
Status: Software fix has been done now to two of three applications where it is needed. The last one (ACME Server) will be done next week. I have now trained another Telia PKI team member to help me to create Mozilla incident reports and reviewing CA bugs from other CAs.
Assignee | ||
Comment 5•4 years ago
|
||
Now we have updated all Telia applications that receive CSRs to handle divisible-by-8 requirement as expected. From our point of this this is closed now.
Comment 6•4 years ago
|
||
Closing this on or about 13-Nov-2020, unless any other remaining issues are raised.
Updated•4 years ago
|
Assignee | ||
Comment 7•4 years ago
|
||
-
How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
Telia became aware of the problem from the Incident report email from Ben Wilson Sent: Fri 29 Jan 2021 17.48 EEST. -
A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
Times below are EEST (UTC+2)
[Mon 25.01.2021 12:57:18] Certificate was created using Telia ACME solution
[Mon 25.01.2021 23:58:02] Telia lint checker found erroneus certificate and sent email alarm as specified
[Fri 29.01.2021 17:48] Telia got normal level incident report email from Ben Wilson
[Mon 1.2.2021 8-10] Incident was evaluated by Telia PKI team and confirmed to require actions; reasons were investigated, corrective actions were started, similar cases investigated (none was found).
[Mon 1.2.2021 9:55:38] Illegal certificate was revoked
[Mon 1.2.2021 9-10] Better Lint alarming rules were established in Telia – Lint alarm rules were enhanced to reveal further similar kind of incidents
[Mon 1.2.2021 9-10] Bug fix preventing this error was initiated
-
Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.
Telia has initiated fix in ACME server for root cause, this is currently under development. All other Telia SSL processes (SSL order, SSL self-service) were immediately tested and similar behavior was not found. In addition Telia has improved processes so that further similar incidents are found and illegal certificates are revoked swiftly. -
In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.
Certificate https://crt.sh/?id=3979591389&opt=zlint had the issue. All active Telia SSL certificates were re-linted to be sure that this is the only one. -
In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.
https://crt.sh/?id=3979591389 -
Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
This was combination of two problems: a) Telia ACME server let this happen, b) Telia lint alarming was unnotified by Telia PKI team.
When Telia originally implemented the prevention to use P-521 keys it was tested within systems existing at the time (SSL order, SSL self-service). Recently Telia implemented a third SSL method to automate SSL certificate creation: ACME server. Specification for P-521 prevention code was adopted from other certificate software but looks like the prevention is not working for ACME. We are currently investigating why. Telia verified that the other Telia code (SSL order, SSL self-service) are still preventing P-521. Tests with ACME with P-521 were never done because same configuration prevented it in other use cases.
Telia is using daily lint checking. It found this problem as expected and sent the alarm to persons responsible of those. Unfortunately the person in charge didn't notice this alarm because of human mistake. All previous lint errors have been noticed like the previous case in Oct 2020 https://bugzilla.mozilla.org/show_bug.cgi?id=1674536 -
List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
Telia will do (or has done) several improvements to prevent similar incidents in the future. Telia has already verified that this was the only lint error that was unnoticed. Telia has also verified that other SSL processes prevent P-521 usage. Telia has revoked the problematic certificate. The improvements are:
a) Telia ACME server will be fixed ASAP to prevent P-521 usage
b) P-521 test case in ACME channel is added to regular test scenarios
c) Lint error alarms are followed by three other persons in addition to the one in charge of those
d) New regular monthly lint check (previously quarterly) will be done from now on to verify that no lint problems have been unnotified or if lint has been updated to find problems that were previously unnotified by it.
Assignee | ||
Comment 8•4 years ago
|
||
I accidentely added the incident report in comment 7 to this bz when it should have been only in https://bugzilla.mozilla.org/show_bug.cgi?id=1689589
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Description
•