Buypass: Delayed revocation of TLS certificates
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: mads.henriksveen, Assigned: mads.henriksveen)
References
(Blocks 1 open bug)
Details
(Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-03-03)
Attachments
(2 files)
Incident Report
This is a preliminary report.
Summary
Buypass has issued TLS certificates using an external DNS Resolver as described in https://bugzilla.mozilla.org/show_bug.cgi?id=1872371.
Affected certificates should have been revoked within 5 days after we were made aware of the incident. This has not been completed.
A full incident report will be provided no later than Friday January 5th 2024.
Comment 1•2 years ago
|
||
The revocation needed to occur within 24 hours, not 5 days, per 4.9.1.1:
With the exception of Short‐lived Subscriber Certificates, the CA SHALL revoke a Certificate within 24 hours and use the corresponding CRLReason (see Section 7.2.2) if one or more of the following occurs:
...
- The CA obtains evidence that the validation of domain authorization or control for any Fully‐Qualified Domain Name or IP address in the Certificate should not be relied upon (CRLReason #4, superseded).
Updated•2 years ago
|
| Assignee | ||
Comment 2•2 years ago
|
||
We considered this to be a violation of the requirement 'The CA is made aware that the Certificate was not issued in accordance with CA/Browser Forum Baseline Requirements, the applicable Certificate Policy or the Certification Practice Statement'.
We agree that this should be considered a violation of the requirement 'The CA obtains evidence that the validation of domain authorization or control for any Fully-Qualified Domain Name or IP address in the Certificate should not be relied upon'.
| Assignee | ||
Comment 3•2 years ago
|
||
Incident Report
This is the full incident report.
Summary
Buypass has issued TLS certificates using an external DNS Resolver as described in https://bugzilla.mozilla.org/show_bug.cgi?id=1872371
All affected certificates should have been revoked within 24 hours after we were made aware of the incident. This has not been completed.
Impact
All certificates affected by the original incident are also affected by this incident.
Timeline
All times are UTC+1
2023-12-22:
- 09:00: We became aware that using external DNS Resolvers are considered to be DTP and not allowed for domain validation
- 10:00: We started investigation and concluded that Buypass ACME only used external DNS resolvers
- 11:26: We stopped issuing certificates
- 23:00: Switched Buypass ACME from using external DNS Resolvers to an internal DNS Resolver
- 23:15: Resumed certificate issuance. At this point in time we didn't allow for reusing domain validations for Buypass ACME (a temporary measure)
2023-12-23:
- 10:00: Started to identify affected Subscribers and certificates
2023-12-27: We started investigations to understand how the changes made in Buypass ACME affected the overall service, including the ability to handle large volumes of orders (renewals) in short time
2023-12-28: Notified the first set of Subscribers with few certificates and encouraged them to renew their certificates immediately
2023-12-29: We started to get feedback from Subscribers that Buypass ACME had some issues and started investigations based on the feedback
2023-12-30: We continued to investigate issues for Subscribers
2024-01-02:
- 09:00: We understood that some reported issues with rate-limits were related to the temporary measure of not reusing domain validations
- 15:00: We started to allow for reuse of domain validations performed after December 23rd
- 15:30: We shortened the time window for the counting pending authorizations rate-limit from 7 days to 1 day (to resolve blocks for Subscribers due to not allowing reuse of domain validations)
- 16:00: We revoked the first set of affected certificates (291 certs)
2024-01-03:
- 10:00: The changes made had positive effect so we decided to notify Subscribers with high volumes of certificates as well
- 12:00: Completed the plan for notification and revocation of all affected Subscribers and certificates
- 14:00: Continued notification of affected Subscribers
- 22:00: We increased rate-limits for a set of Subscribers to be able to handle large number of orders (renewals)
2024-01-04:
- 10:00: We increased the default rate-limits for all Subscribers
- 12:00: Revoked certificates (3000)
- 14:00: Revoked certificates (8622)
2024-01-05
- 09:00: Revoked certificates (20000)
- 14:00: Revoked certificates (10000)
- 15:30: All affected Subscribers have been notified
Root Cause Analysis
We failed to revoke all affected certificates within 24 hours due to several conditions.
We changed the solution to fix the original issue (using an external DNS Resolver) and added changes that affected its functionality (not reusing domain validations).
The changes were made right before the Christmas holidays.
Due to the high volumes of affected certificates, uncertainty of Subscriber preparedness to pick up and handle renewals during the holidays and to avoid disruption for affected Subscribers during Christmas holidays we decided to postpone the notification and mass revocation of affected certificates.
We had to ensure that the changed solution were able to handle the large volumes of certificate renewals before large scale notifications and revocations could take place.
After notifying a set of Subscribers with few certificates to ensure renewal started at low scale, we experienced issues with Buypass ACME. These issues needed to be understood and fixed before we could start with large scale notification and mass revocation.
The issues were investigated and we understood the cause of them, but the fixes included both system changes, configuration changes and it took time before we had verified that the solution was stable.
We were ready to start with large scale notification and mass revocation early January. We have notified all Subscribers and will revoke all certificates no later than January 12th.
We are aware that this is a violation of the requirements and we have already made improvements and important learnings that we will use to take further actions to make sure we are better prepared if we need to revoke certificates at a later time (see action items).
Lessons Learned
What went well
What didn't go well
- The incident occured right before the Christmas holidays
- Unforeseen issues with the updated Buypass ACME delayed the process of large scale notification and mass revocation
Where we got lucky
Action Items
| Action Item | Kind | Due Date |
|---|---|---|
| Add support for ACME Renewal Information (ARI) in Buypass ACME | Mitigate | 2024-04-01 |
| Ensure Subscribers are aware of expectations of immediate renewal of certificates when needed | Mitigate | 2024-02-01 |
Appendix
Details of affected certificates
See affected certificates in https://bugzilla.mozilla.org/show_bug.cgi?id=1872371
| Assignee | ||
Comment 4•2 years ago
|
||
| Assignee | ||
Comment 5•2 years ago
|
||
| Assignee | ||
Comment 6•2 years ago
|
||
I have a correction to the full incident report regarding affected certificates. 657 of the certificates affected by the original incident were revoked before, or within 24 hours after, this incident occurred. See attached files.
We have revoked 86484 affected certificates.
| Assignee | ||
Comment 7•2 years ago
|
||
All affected certificates have now been revoked (or have expired).
| Assignee | ||
Comment 8•2 years ago
|
||
We have no new information in this bug.
Updated•2 years ago
|
Updated•2 years ago
|
| Assignee | ||
Comment 9•2 years ago
|
||
An update on the action items:
We will work on action item ‘Ensure Subscribers are aware of expectations of immediate renewal of certificates when needed' in parallel with the similar action item in https://bugzilla.mozilla.org/show_bug.cgi?id=1865368.
| Action Item | Kind | Due Date |
|---|---|---|
| Add support for ACME Renewal Information (ARI) in Buypass ACME | Mitigate | 2024-04-01 |
| Ensure Subscribers are aware of expectations of immediate renewal (and revocation) of certificates when needed | Mitigate | 2024-02-15 |
| Assignee | ||
Comment 10•1 year ago
|
||
An update on the action items:
| Action Item | Kind | Due Date |
|---|---|---|
| Add support for ACME Renewal Information (ARI) in Buypass ACME | Mitigate | 2024-04-01 |
| Ensure Subscribers are aware of expectations of immediate renewal (and revocation) of certificates when needed | Mitigate | DONE |
| Assignee | ||
Comment 11•1 year ago
|
||
We have no new information in this bug.
| Assignee | ||
Comment 12•1 year ago
|
||
We have no new information in this bug.
As the open action item has due date 2024-04-01, we kindly request the NextUpdate field be set to 2024-04-03 due to easter holidays.
Updated•1 year ago
|
| Assignee | ||
Comment 13•1 year ago
|
||
An update on the action items:
We have not been able to complete the action item 'Add support for ACME Renewal Information (ARI) in Buypass ACME', due date is updated.
| Action Item | Kind | Due Date |
|---|---|---|
| Add support for ACME Renewal Information (ARI) in Buypass ACME | Mitigate | 2024-05-15 |
| Ensure Subscribers are aware of expectations of immediate renewal (and revocation) of certificates when needed | Mitigate | DONE |
Updated•1 year ago
|
| Assignee | ||
Comment 14•1 year ago
|
||
We will update the action item no later than 2024-05-15.
| Assignee | ||
Comment 15•1 year ago
|
||
An update on the action items:
We will not be able to complete ACME Renewal Information (ARI) in Buypass ACME within the current development period (Q2). This will be finalized during the next development period (Q3), due date is updated.
| Action Item | Kind | Due Date |
|---|---|---|
| Add support for ACME Renewal Information (ARI) in Buypass ACME | Mitigate | 2024-09-15 |
| Ensure Subscribers are aware of expectations of immediate renewal (and revocation) of certificates when needed | Mitigate | DONE |
Comment 16•1 year ago
|
||
Given other CAs will be implementing ACME ARI, what barriers have you run into that changes the deployment roadmap? Sharing this information would be beneficial to all parties.
Updated•1 year ago
|
| Assignee | ||
Comment 17•1 year ago
|
||
(In reply to Wayne from comment #16)
Given other CAs will be implementing ACME ARI, what barriers have you run into that changes the deployment roadmap? Sharing this information would be beneficial to all parties.
We have not run into any explicit barriers. We had planned for the ACME ARI to be implemented during Q2. When the development team gave their estimates of the amount of work left (which includes using a new framework and extensive testing), we realized that it was not possible to complete this in parallel with other prioritized development tasks in Q2.
Comment 18•1 year ago
|
||
So to be more clear, you are choosing to not prioritize this work, in order to avoid disrupting other plans, correct? Could you help the community understand how you came to that prioritization?
Updated•1 year ago
|
| Assignee | ||
Comment 19•1 year ago
|
||
(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #18)
So to be more clear, you are choosing to not prioritize this work, in order to avoid disrupting other plans, correct? Could you help the community understand how you came to that prioritization?
Not really, we are still prioritizing both ARI and other tasks originally planned for Q2, but the initial estimates for the development tasks scheduled for Q2 were too optimistic. We have not prioritized other tasks over ARI. However, the implementation of ARI also depends on other development tasks currently in process.
Comment 20•1 year ago
|
||
Unless there are additional comments or questions, I think this bug can be closed later next week.
Updated•1 year ago
|
Comment 21•1 year ago
|
||
Action Item "Add support for ACME Renewal Information (ARI) in Buypass ACME" appears as still due to be done.
| Assignee | ||
Comment 22•1 year ago
|
||
(In reply to Ben Wilson from comment #21)
Action Item "Add support for ACME Renewal Information (ARI) in Buypass ACME" appears as still due to be done.
That's correct, we changed the due date to 2024-09-15 in comment #15. Please set Next update accordingly.
Updated•1 year ago
|
Comment 23•1 year ago
|
||
(In reply to Antti Backman from comment #24)
As all action items have been completed about a month ago, we kindly ask if this incident could be closed.
Antii,
You still have not dealt with the problem that your Root Cause Analysis is incorrect and that you have no Action Items to address the real root cause. I explained at length this failure in comment 7. Your reply in comment 11 failed to acknowledge your ultimate responsibility for ensuring that revocations occur on time, and other commentors’ subsequent feedback in comment 12 and comment 18 haven’t created any change in your position.
Mozilla gave us a good roadmap for dealing with the deliberate decision to miss revocation in bug 1889062 comment 15, where among other things Mozilla prescribes,
- detailed changes to policies and procedures to ensure timely revocation, including new guidelines, checklists, and approval processes; and
- monitoring and auditing to ensure compliance with such policies and procedures and to identify any lapses quickly
I believe you need to publish a Root Causes Analysis that acknowledges your CA’s sole responsibility for on-time revocation and credible Action Items that will address this problem, such as those given above, before this issue can be closed.
Ben, given the grilling that Entrust and several other CAs have received recently in relation to the exact same sort of bug, I have to push back on your comment 25. I recommend leaving this issue open until Telia deals with these matters.
Comment 24•1 year ago
|
||
(In reply to Tim Callan from comment #23)
My apologies. I put this in the wrong bug. Please ignore.
| Assignee | ||
Comment 25•1 year ago
|
||
An update on the action items:
We have now implemented support for ARI in Buypass ACME.
| Action Item | Kind | Due Date |
|---|---|---|
| Add support for ACME Renewal Information (ARI) in Buypass ACME | Mitigate | DONE |
| Ensure Subscribers are aware of expectations of immediate renewal (and revocation) of certificates when needed | Mitigate | DONE |
Comment 26•1 year ago
|
||
Leaving this open for the time being.
| Assignee | ||
Comment 27•1 year ago
|
||
We have no new information in this bug.
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 28•1 year ago
|
||
We have no new information in this bug and all action items have been closed. If there are no more comments or questions, I suggest we close this bug.
| Assignee | ||
Comment 29•1 year ago
|
||
We have no new information in this bug. If there are no more comments or questions, I suggest we close this bug.
Comment 30•1 year ago
|
||
I'd like to keep this open for at least another month, during which we can continue to work on better approaches and solutions for delayed revocation situations.
Comment 31•1 year ago
|
||
We continue work on incident-reporting and compliance requirements aimed at reducing delayed revocation, so this bug will remain open until at least February 1, 2025. Meanwhile, CAs should review https://github.com/mozilla/www.ccadb.org/pull/186.
Updated•1 year ago
|
| Assignee | ||
Comment 32•1 year ago
|
||
Incident Report Closure Summary
- Incident Description: Buypass was not able to revoke approx. 177 000 certificates within 24 hours.
- Incident Root Cause(s): The underlying incident was identified right before Christmas holidays, and as we were not confident subscribers would read notification of revocation during the holidays, we were concerned revocation could be disruptive. Also, there was need for change in our solution to handle large amounts of certificate renewals.
- Remediation Description: We assured that the solution handled high volumes of renewals. We included support for ARI. We notified subscribers to make them aware of expectations of immediate renewal (and revocation) and included this in the subscriber agreement.
- Commitment Summary: Buypass commit to follow revocation requirements as defined in BR.
All Action Items disclosed in this Incident Report have been completed as described, and we request its closure.
Comment 33•1 year ago
|
||
I intend to close this on or about Wednesday, 12-Feb-2025.
Comment 34•1 year ago
|
||
(In reply to Mads Henriksveen from comment #32)
Mads,
We notified subscribers to make them aware of expectations of immediate renewal (and revocation) and included this in the subscriber agreement.
I sincerely hope your Subscriber Agreement contained such language prior to this incident.
Question 1: Was the right for Buypass to immediately revoke certificates included in your Subscriber Agreement at the beginning of December, 2023? If not, you need to open another Bugzilla bug for that. If yes, what is the meaningful change you intend for this line to convey? Why was that language ineffective the last time and what did you do to make it effective this time?
Buypass commit to follow revocation requirements as defined in BR.
Very general statements like this that reference some abstract obedience to the BRs or root program requirements without spelling out the actual commitment have been a source of much trouble on Bugzilla for the past year. In 2024 multiple CAs with delayed revocation problems used overly vague statements like this one to avoid committing to meaningful change while pretending otherwise.
While I appreciate Mozilla’s guidance that a blanket ban on deliberate delay of mandatory revocation is not a Mozilla root store requirement, and therefore not required to close this bug, I still think you can do better than you have here. After all, Buypass had an obligation to obey the Baseline Requirements even as it chose to delay revocation of 177,000 certificates for the convenience of its Subscribers. I must imagine, had someone asked you on December 1, 2023, you would have answered that yes, you follow the Baseline Requirements. And yet, we wound up here.
Your action items don’t provide any help. One of them is implementation of a technical capability that is external to the decision not to revoke on time. The other is a customer communication task. I don’t see a policy change or procedural change as part of your action items. I don’t see a commitment to the Bugzilla community to apply rigor to this section of the BRs, and I don’t see any admission of culpability on the part of Buypass.
Some things you could do to inspire greater confidence in future compliance would include:
- Clearly state that you recognize that your decision to delay revocation in December of 2023 was wrong at the time, according to the published rules of that day and a public CA’s obligation to the entire WebPKI, and that given an opportunity to do it over again, you would have made a different choice.
- Exact a formal policy change to make it orders of magnitude harder to gain a deliberate revocation delay, train employees on this policy, and publish its language publicly on this incident.
- Clarify that holidays, vacation periods, weekends, freeze periods, and the like will have no bearing whatsoever on mandatory revocation timelines going forward.
- Either make a firm commitment to never willingly delay mandatory revocation again (which, though not required, some CAs are doing) or spell out very specifically what your process and criteria will be before granting a delay in the future.
Comment 35•1 year ago
|
||
I'll hold off on closing this bug until after Buypass has filed its response to Comment #34.
| Assignee | ||
Comment 36•1 year ago
|
||
(In reply to Tim Callan from comment #34)
(In reply to Mads Henriksveen from comment #32)
Mads,We notified subscribers to make them aware of expectations of immediate renewal (and revocation) and included this in the subscriber agreement.
I sincerely hope your Subscriber Agreement contained such language prior to this incident.
Question 1: Was the right for Buypass to immediately revoke certificates included in your Subscriber Agreement at the beginning of December, 2023? If not, you need to open another Bugzilla bug for that. If yes, what is the meaningful change you intend for this line to convey? Why was that language ineffective the last time and what did you do to make it effective this time?
Buypass commit to follow revocation requirements as defined in BR.
Very general statements like this that reference some abstract obedience to the BRs or root program requirements without spelling out the actual commitment have been a source of much trouble on Bugzilla for the past year. In 2024 multiple CAs with delayed revocation problems used overly vague statements like this one to avoid committing to meaningful change while pretending otherwise.
While I appreciate Mozilla’s guidance that a blanket ban on deliberate delay of mandatory revocation is not a Mozilla root store requirement, and therefore not required to close this bug, I still think you can do better than you have here. After all, Buypass had an obligation to obey the Baseline Requirements even as it chose to delay revocation of 177,000 certificates for the convenience of its Subscribers. I must imagine, had someone asked you on December 1, 2023, you would have answered that yes, you follow the Baseline Requirements. And yet, we wound up here.
Your action items don’t provide any help. One of them is implementation of a technical capability that is external to the decision not to revoke on time. The other is a customer communication task. I don’t see a policy change or procedural change as part of your action items. I don’t see a commitment to the Bugzilla community to apply rigor to this section of the BRs, and I don’t see any admission of culpability on the part of Buypass.
Some things you could do to inspire greater confidence in future compliance would include:
- Clearly state that you recognize that your decision to delay revocation in December of 2023 was wrong at the time, according to the published rules of that day and a public CA’s obligation to the entire WebPKI, and that given an opportunity to do it over again, you would have made a different choice.
- Exact a formal policy change to make it orders of magnitude harder to gain a deliberate revocation delay, train employees on this policy, and publish its language publicly on this incident.
- Clarify that holidays, vacation periods, weekends, freeze periods, and the like will have no bearing whatsoever on mandatory revocation timelines going forward.
- Either make a firm commitment to never willingly delay mandatory revocation again (which, though not required, some CAs are doing) or spell out very specifically what your process and criteria will be before granting a delay in the future.
Thank you for the feedback and recommendations.
Please remember that this bug was filed more than one year ago and the closure summary reflects the actions we took as part of the immediate follow up of the incident. It does not reflect all actions taken later due to the subsequent focus on late revocations in the industry caused by the high amount of late revocation incidents. As a general statement, we are much better prepared now to handle similar situations and it’s our clear intention to avoid any late revocations in general. This is justified by changes in internal policies, procedures and systems to minimize the risk of late revocations. We do also have a continuous engagement and dialogue with our subscribers, both directly and indirectly, to ensure they are well informed and prepared for immediate revocations in the future.
To answer your question, we informed all our subscribers about the importance of immediate revocations both by direct communication and by clarifying this in the subscriber agreement. Our right to do immediate revocations was already included in the agreement, but we clarified that this is an obligation in case of an incident.
Comment 37•1 year ago
|
||
I will close this tomorrow, 14-Feb-2025.
Updated•1 year ago
|
Description
•