Microsoft PKI Services: Certificate Mis-Issuance, Locality Missing
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: johnmas, Assigned: johnmas)
Details
(Whiteboard: [ca-compliance] [ca-misissuance])
Attachments
(8 files)
|
1.51 KB,
application/x-x509-ca-cert
|
Details | |
|
1.51 KB,
application/x-x509-ca-cert
|
Details | |
|
1.51 KB,
application/x-x509-ca-cert
|
Details | |
|
2.63 KB,
application/x-x509-ca-cert
|
Details | |
|
1.51 KB,
application/x-x509-ca-cert
|
Details | |
|
2.63 KB,
application/x-x509-ca-cert
|
Details | |
|
2.63 KB,
application/x-x509-ca-cert
|
Details | |
|
2.63 KB,
application/x-x509-ca-cert
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4133.0 Safari/537.36 Edg/84.0.508.0
| Assignee | ||
Comment 1•5 years ago
|
||
| Assignee | ||
Comment 2•5 years ago
|
||
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Comment 3•5 years ago
|
||
| Assignee | ||
Comment 4•5 years ago
|
||
| Assignee | ||
Comment 5•5 years ago
|
||
| Assignee | ||
Comment 6•5 years ago
|
||
| Assignee | ||
Comment 7•5 years ago
|
||
| Assignee | ||
Comment 8•5 years ago
|
||
- How your CA first became aware of the problem
This problem was noticed while running x509 lint tests on certificates that MS PKI created but did not issue to a customer on April 28th, 2020. The eight (8) issued (but not delivered) certificates contained an organization value but did not contain a locality or state or province value that was discovered during linting.
The error was caused because MS PKI manually issued these certificates outside of our normal automated issuance process, which meant that our certificate profile enforcement rules were not applied to the certificate request in an automated manner. Our manual pre- and post-issuance processes did catch the error, but only after the certificates were already created.
The certificates were issued outside of our automated certificate issuance system because it was built to only issue certificates to subscribers internal to Microsoft and would have imposed incorrect values on the certificates, like O=Microsoft Corporation. Since this issuance, to an entity outside of Microsoft, was intended to be a one-time issuance, it did not make sense to rewrite our logic to allow for the issuance of these certificates. Instead we used manual quality checks pre and post issuance to validate the certificates before they were delivered to the customer.
-
A timeline of actions your CA took in response
A. These certificates were issued on 4/28 between 10:30 – 11:30pm PDT.
B. The linting of the certificates took place on 4/29 at approximately 1:00am PDT.
C. After failing linting, the eight (8) certificates were revoked on 4/29.
D. An internal MS incident was created on 4/29 to capture the details of the incident for our internal incident review process.
E. An initial review of the incident was performed with MS PKI leadership on 4/30.
F. On 5/4, the mi-issuance was reviewed with our Web Trust auditors.
G. On 5/11, a demo was performed with our Web Trust auditors to demonstrate that had the certificate been issued through our normal
certificate issuance process the error would not have occurred.
H. On 6/2, a final internal incident review was completed with the MS PKI leadership team.
I. We have spent the last week preparing this timeline for submission to Bugzilla. -
Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem.
These were intended to be a one-time issuance to prove private key possession by MS PKI. This is a very infrequent process that we use manual controls on. Our pre-issuance review of the request files missed this issue, but our final quality checks (linting) identified the issue before the certificates were returned to the customer. We successfully issued eight (8) certs with the proper locality and state or province value approximately 10 days after this initial incident. We do not anticipate performing this manual process anytime in the near future.
- A summary of the problematic certs.
The certificates contained an organization value but did not contain a locality or state or province value. The issue with these certificates was discovered post-issuance during the quality testing phase and none were released to the customer. The post issuance tools are in a separate environment from our CA’s, so all eight (8) certificates were issued first and then post-issuance they were quality checked a few hours later in a separate environment. Because this is a manual process, the eight certificates were handled in a batch through each step.
- The complete certificate data for the problematic certificates.
Attached to this bug.
- Explanation about how the mistakes were made or bugs introduced, and how they avoided detection until now.
Manual reviews of the certificate request files prior to issuance did not discover the missing locality fields. However, our post issuance checks did discover the problems and the certificates were revoked and never delivered to the customer. This is a very infrequent manual process that was used outside of our normal processes as the customer was outside of Microsoft.
- List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
These certificates were a one-time issuance to an external entity that is very unusual and happens very infrequently, once or twice in 10 years. As a manual process, we did identify the problems with the certificates before they were delivered to the customer and in the wild, the certificates were not used and were revoked within 12 hours. Further, as demonstrated to our auditors the tooling is in place to prevent these same errors from occurring during our normal automated processes.
We have now added manual pre issuance processes for external certificates that should prevent this from happening again. And if we ever start to issue external certificates more frequently, we will update our automation to allow for external and internal certificates.
Comment 9•5 years ago
|
||
Thank you for this incident report.
(In reply to John Mason from comment #8)
The certificates were issued outside of our automated certificate issuance system because it was built to only issue certificates to subscribers internal to Microsoft and would have imposed incorrect values on the certificates, like O=Microsoft Corporation. Since this issuance, to an entity outside of Microsoft, was intended to be a one-time issuance, it did not make sense to rewrite our logic to allow for the issuance of these certificates. Instead we used manual quality checks pre and post issuance to validate the certificates before they were delivered to the customer.
Can you explain what those manual quality checks entailed?
- A timeline of actions your CA took in response
A. These certificates were issued on 4/28 between 10:30 – 11:30pm PDT.
B. The linting of the certificates took place on 4/29 at approximately 1:00am PDT.
C. After failing linting, the eight (8) certificates were revoked on 4/29.
D. An internal MS incident was created on 4/29 to capture the details of the incident for our internal incident review process.
E. An initial review of the incident was performed with MS PKI leadership on 4/30.
F. On 5/4, the mi-issuance was reviewed with our Web Trust auditors.
G. On 5/11, a demo was performed with our Web Trust auditors to demonstrate that had the certificate been issued through our normal
certificate issuance process the error would not have occurred.
H. On 6/2, a final internal incident review was completed with the MS PKI leadership team.
I. We have spent the last week preparing this timeline for submission to Bugzilla.
Allowing more than a month to pass between detecting and reporting a misissuance is entirely too long and a violation of Mozilla guidance on incident reporting. Please explain why such a long time was required, and what is being done to ensure that future incident reports are timely.
- A summary of the problematic certs.
The certificates contained an organization value but did not contain a locality or state or province value. The issue with these certificates was discovered post-issuance during the quality testing phase and none were released to the customer. The post issuance tools are in a separate environment from our CA’s, so all eight (8) certificates were issued first and then post-issuance they were quality checked a few hours later in a separate environment. Because this is a manual process, the eight certificates were handled in a batch through each step.
I would like to point out that the fact that these certificates were not released to the customer is not a mitigation for the problem - at best, it's irrelevant. But it's concerning to see it repeated in this report as if it should be though of as reducing the impact of this misissuance.
- The complete certificate data for the problematic certificates.
Attached to this bug.
- Explanation about how the mistakes were made or bugs introduced, and how they avoided detection until now.
Manual reviews of the certificate request files prior to issuance did not discover the missing locality fields. However, our post issuance checks did discover the problems and the certificates were revoked and never delivered to the customer. This is a very infrequent manual process that was used outside of our normal processes as the customer was outside of Microsoft.
- List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
These certificates were a one-time issuance to an external entity that is very unusual and happens very infrequently, once or twice in 10 years. As a manual process, we did identify the problems with the certificates before they were delivered to the customer and in the wild, the certificates were not used and were revoked within 12 hours. Further, as demonstrated to our auditors the tooling is in place to prevent these same errors from occurring during our normal automated processes.
We have now added manual pre issuance processes for external certificates that should prevent this from happening again. And if we ever start to issue external certificates more frequently, we will update our automation to allow for external and internal certificates.
Incident reports are intended to benefit the community in part by helping other CAs avoid the same mistake, and also to provide sufficient detail that others can validate the remediation steps being taken. With that in mind, can you describe the manual pre issuance processes that are now in place?
| Assignee | ||
Comment 10•5 years ago
|
||
Great feedback Wayne, thank you.
Wayne - Can you explain what those manual quality checks entailed?
We attempted to replicate the automation that our existing tools perform, once the identity validation was completed. For the first issuance attempt, we had two qualified individuals manually inspect and go through the certificate request files in a systematic manner looking through each field for conformance to our CP and CPS, they each did their own review and compared notes afterwards.
Wayne - Allowing more than a month to pass between detecting and reporting a mis issuance is entirely too long and a violation of Mozilla guidance on incident reporting. Please explain why such a long time was required, and what is being done to ensure that future incident reports are timely.
This is a factor of how we have chosen to internally handle incidents. We currently submit each incident to a review process internally before we share them broadly in Bugzilla. We worked hard to handle this a quickly as possible, but we are hearing the feedback that we must improve our turnaround time. We will work internally to improve the turnaround time and do some of the reporting in parallel (internal and external).
Wayne - I would like to point out that the fact that these certificates were not released to the customer is not a mitigation for the problem - at best, it's irrelevant. But it's concerning to see it repeated in this report as if it should be though of as reducing the impact of this misissuance.
With due respect, our experience is that manual processes are imperfect in nature and mistakes happen. All our processes, manual and automated, have pre and post checks that act as a fail safe for detecting problems. We believe that post issuance checks, like linting, are important mitigations for detecting problems. We agree that catching it after the fact does not reduce the impact and apologize if the language above indicated otherwise.
Wayne - Incident reports are intended to benefit the community in part by helping other CAs avoid the same mistake, and also to provide sufficient detail that others can validate the remediation steps being taken. With that in mind, can you describe the manual pre issuance processes that are now in place?
The main difference in our manual pre issuance process now is that it has been documented, line by line to insure we cover all of the controls that our automated tools enforce. We continue to have at least two qualified individuals do the pre issuance reviews. In the medium term (end of the calendar year) we are working to update our automated tools, so that we can use them for both internal and external certificate issuance. Currently our automated tools were built to enforce both Certificate Profiles and Linting at the same time, our updates will pair these features to allow us to enforce both or each separately.
Comment 11•5 years ago
|
||
This is a factor of how we have chosen to internally handle incidents. We currently submit each incident to a review process internally before we share them broadly in Bugzilla. We worked hard to handle this a quickly as possible, but we are hearing the feedback that we must improve our turnaround time. We will work internally to improve the turnaround time and do some of the reporting in parallel (internal and external).
Can you be more specific about the things that you'll be doing to improve turnaround time and parallel external reporting?
The main difference in our manual pre issuance process now is that it has been documented, line by line to insure we cover all of the controls that our automated tools enforce. We continue to have at least two qualified individuals do the pre issuance reviews. In the medium term (end of the calendar year) we are working to update our automated tools, so that we can use them for both internal and external certificate issuance. Currently our automated tools were built to enforce both Certificate Profiles and Linting at the same time, our updates will pair these features to allow us to enforce both or each separately.
Can you be more specific about the what will be updated in your automated tools that will improve the quality of certificates issued?
Thanks.
| Assignee | ||
Comment 12•5 years ago
|
||
Thanks Ben, with regard to your two questions:
- Can you be more specific about the things that you'll be doing to improve turnaround time and parallel external reporting?
We are going to improve the time it takes to report the incidents to Bugzilla by doing our internal and our Bugzilla reporting in parallel, as opposed to completing the internal review before we post to Bugzilla. We now have broad support from our leadership team to perform these in parallel. We have been working with them to understand all of the benefits of participating in this community, as well as our obligations. We have only been through this process a few times thus far and the team is becoming more comfortable with these processes.
- Can you be more specific about what will be updated in your automated tools that will improve the quality of certificates issued?
We are working to allow our automated tools to be configured for a broader set of use cases, such as internal and external certificates. The tools will be more easily configurable to allow Linting independent of enforcing Internal Certificate Profiles. These changes will allow our team to use our automated tools for tasks that we very rarely perform (as was the case in this incident, that resulted in us using manual checks).
Comment 13•5 years ago
|
||
I recommend that we close this bug. I'll set a reminder to close this on or after 31-July-2020 unless we are notified of additional questions or issues.
Comment 14•5 years ago
|
||
No further questions here. I'm encouraged that Microsoft's leadership is recognizing the value in transparency as a publicly trusted CA, as Comment #12 reflects. I realize that, particularly for large technology companies operating CAs as incidental (i.e. not their core business area), this is a whole new level for them, and I'm encouraged that some are rising to the challenge. It's a positive response that stands in stark contrast to some other vendors responses, by promoting timely transparency.
Updated•5 years ago
|
Updated•3 years ago
|
Updated•2 years ago
|
Updated•1 year ago
|
Description
•