Closed Bug 1887888 Opened 2 years ago Closed 1 year ago

Hongkong Post: Delayed revocation of TLS certificates with basicConstraints not marked as critical

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: manho, Assigned: manho)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-03-03)

Attachments

(1 file, 1 obsolete file)

19.03 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Details

Incident Report

This is a preliminary report.

Summary

Hongkong Post CA has issued a total of 46 certificates that are affected by the problem with basicConstraints not marked as critical, as described in the bug report at https://bugzilla.mozilla.org/show_bug.cgi?id=1887008.

All certificates should have been revoked within 5 days when CA is made aware of the incident. As mentioned in the bug report provided above, we have been collaborating closely with our affected customers to facilitate the replacement of their certificates, as per our plan outlined in the original bug report. Since the continuity of those customers' critical infrastructures depends on the affected certificates, not all affected certificates were revoked in time.

The affected TLS certificates were mainly issued to government bureaus or departments in Hong Kong SAR, serving the entire local population of over 8 million people. Given the wide spectrum of essential community e-services being offered by diversified subscribers as our major customers, such as Financial Service, Online Payment, Medical and Health, Transport, Trade, Public Order and Legal Aid Service, there are significant concerns from them and therefore difficulties in coordinating the completion for revocation of the affected TLS certificates within the given period of 5 days. If these certificates are revoked without proper coordination and assured completion of replacement TLS certificates by the subscriber organisations, it could lead to substantial cumulative impacts on the sustainable delivery of critical e-services by our government subscribers.

While our TLS certificates are mainly targeted for subscription by local customers for e-services provision to serve local community per se, we are fully aware of the expectations from the overall WebPKI community regarding the prompt revocation of the affected TLS certificates, as demonstrated by some CA owners. In order to minimize the impact on the broader web, we take full responsibility for collaborating closely with our affected customers to give top priority to the replacement process.

We anticipate completing the revocation process for all affected certificates by 2024-05-20, in accordance with the plan outlined in the original bug report and contingent upon no special concern or constraint from any individual subscribers.

Impact

45 out of the 46 affected certificates were not revoked in time, i.e. these certificates are subject of this bug.

Timeline

All times are UTC+8.
The provided timeline focuses solely on the events related to the delayed revocation of the affected TLS certificates. For a comprehensive understanding of the original bug, please refer to https://bugzilla.mozilla.org/show_bug.cgi?id=1887008.

2024-03-21:

  • 08:58 We have been made aware of this error and started examination of the matter with compliance team.

2024-03-22:

  • 18:00 The cause of the problem has been identified. Work on developing a patch for the certificate issuance system to incorporate the latest version (3.6.1) of zlint.
  • 18:54 A total of 46 affected TLS certificates has been identified and attached in the original bug.

2024-03-25:

  • 19:53 The system patch has been successfully implemented in the production system. Any Certificate Signing Request (CSR) that contains a basicConstraints extension failing the zlint linting will be rejected.
  • 20:12 Resumed the issuance of TLS certificates to our customers, allowing them to receive new TLS certificates from our platform.

2024-03-26:

  • 12:40 Confirmed that 1 affected certificate was promptly revoked, while the remaining 45 certificates could not be revoked in the given period of 5 days.
  • 22:40 Posting this preliminary report.

Root Cause Analysis

In response to this certificate problem, our top priority is to promptly reach out to our major customers, followed by the other affected customers, to discuss the potential impact on their websites and online services. We clearly confirm and it is important to note that there have been and will be no actual disruptions to websites or online services adopting our TLS certificates due to the certificate policies extension. However, all affected customers have expressed grave and genuine concerns if the revocation of the impacted TLS certificates to be conducted within the given period of 5 days.

Just like the main reason stated in the original bug report, all affected TLS certificates are being managed manually by the customers themselves. This process typically involves multiple personnel, including the authorized representative who handles the certificate application, the manager who grants authorization, the system administrator and the end users of online service who conduct regression testing, especially for government websites and online services. These individuals must go through several manual procedures in order to complete installation of the TLS certificates on their systems. It is important to recognize that certificate replacement, even though under urgent requests as to resolve the current incident, must take ample time for the customers concerned to complete through manual process.

Furthermore, it is important to note that our major customers, which include government bureaus and departments, do not utilize a unified solution for managing and deploying their TLS certificates. Consequently, a mass revocation of certificates as being called for would significantly take time in ensuring each individual customer for completion of installing the new TLS certificates on their respective servers. Revoking impacted certificates without proper coordination and assured completion of replacement TLS certificates could have a significant cumulative impact on government websites and online services.

Lessons Learned

What went well

  • Major customers are willing to cooperate.

What didn't go well

  • The affected TLS certificates are being utilized by our major customers in critical applications. Implementing emergency processes to replace these certificates may introduce higher risks, potentially impacting the stability and security of their applications.
  • It is challenging to explain to some customers the reasons for the urgent revocation and replacement of certificates, as well as the differences in appearance between the certificates before and after replacement.

Where we got lucky

  • N/A

Action Items

Action Item Kind Due Date
Continue to revoke the certificates that have been delayed in revocation. Mitigate 2024-05-20
Ensure customers are aware of possible situations where certificates must be replaced quickly for security reasons. Prevent 2024-05-20

Appendix

Details of affected certificates

See full list in the original bug.

Based on Incident Reporting Template v. 2.0

Assignee: nobody → manho
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [leaf-revocation-delay]

This incident report has gone "stale" and violates the expectations described on CCADB.org.

Flags: needinfo?(manho)

Thanks for bringing this to my attention. I'm sorry for the unintentional oversight in not providing an update here while addressing the original bug, https://bugzilla.mozilla.org/show_bug.cgi?id=1887008, on 2024-04-26. We have already notified all affected customers about the incident and it is imperative that their certificates must be replaced quickly. The revocation process for the affected certificates is currently underway.

Flags: needinfo?(manho)

The original bug report included a status update on the ongoing revocation process for the affected TLS certificates, and that update is provided again here for easy reference.

2024-04-26:

  • 23:00 In the process of re-issuing TLS certificates, a total of 43 new certificates have been provided to our customers. Consequently, there are currently 3 outstanding certificates remaining, which are either awaiting the customers' generation of a Certificate Signing Request (CSR) or have already confirmed as no longer required. With confirmation from our customers, we have already revoked 26 TLS certificates that were affected. It is anticipated that this number will continue to increase as their certificates are replaced.

    It’s worth noting that out of the 46 affected TLS certificates, 1 of them was actually revoked within 5 days when we were made aware of the incident.

2024-05-03:

  • 18:15 The re-issuance of TLS certificates is underway. We have additionally re-issued 2 new certificates, bringing the total number of new certificates provided to our customers to 45. Additionally, 1 certificate has been verified as no longer required and has been revoked. With confirmation from our customers, we have already revoked 41 TLS certificates that were affected.

2024-05-10

  • 20:45 With confirmation from our customers, we have already revoked 44 certificates (accounting for 95.7% of the affected certificates), and 2 certificates (4.3%) are pending confirmation from customers regarding the successful replacement of their affected certificates.

2024-05-17

  • 20:30 All affected certificates were revoked.
Action Item Kind Due Date
Continue to revoke the certificates that have been delayed in revocation. Mitigate DONE
Ensure customers are aware of possible situations where certificates must be replaced quickly for security reasons. Prevent DONE

Does anyone have any comments, questions, or suggestions?

My understanding was that a delayed revocation incident was to include per-subscriber detail as to specifically why each subscriber could not (rather than “would rather not”) have their certificates revoked on the appropriate timeline. Is that not the case? I don’t see such detail here. It took almost two months for the last of these certificates to be revoked. Were those subscribers prioritizing the work? Weekend shifts or paying for overtime? Would they have been out of service for eight weeks if their private key had been compromised and OneCRL had blocked the certificates?

More generally, I feel that if a CA is going to be “unable” to revoke certificates appropriately because they are in certain industries, then the CA should not be issuing certificates to subscribers in those industries. If they cannot operate within the bounds of WebPKI’s requirements, then they should look outside WebPKI for their authentication needs.

Questions for Hongkong Post CA:

  • when certificates were issued to these subscribers, was Hongkong Post CA aware of the BR requirement that misissued certificates needed to be revoked within 24 hours (limit 5 days)?
  • when certificates were issued to these subscribers, was Hongkong Post CA aware that these subscribers were planning to deploy those certificates to services for which operational disruption could lead to “substantial cumulative impacts on the sustainable delivery of critical e-services by our government subscribers”?
  • if the answers to those questions are “yes”, then why did Hongkong Post CA issue the certificates without assurances that the subscribers would be able to appropriately react to the specified timeline for revocation?
  • what does Hongkong Post CA communicate to prospective subscribers about the revocation requirements that Hongkong Post CA has agreed to uphold?
  • what steps is Hongkong Post CA taking to ensure that no further certificates are issued (including renewal) to subscribers whose limitations would lead Hongkong Post CA to delay revocation in any circumstance in the future?

The root cause section says

In response to this certificate problem, our top priority is to promptly reach out to our major customers, followed by the other affected customers, to discuss the potential impact on their websites and online services.

which I think is worrying. A CA’s top priority, upon discovering that they have misissued certificates, should be to ensure that those certificates are revoked promptly to undo the damage (large or small) that said certs do to WebPKI’s integrity.

Flags: needinfo?(manho)

(In reply to Man Ho from comment #5)

2024-05-17

  • 20:30 All affected certificates were revoked.
Action Item Kind Due Date
Continue to revoke the certificates that have been delayed in revocation. Mitigate DONE
Ensure customers are aware of possible situations where certificates must be replaced quickly for security reasons. Prevent DONE

I worry about the framing/focus of this second Action Item (as well as how it's measured). There are a myriad of reasons certificates may need to be replaced and doing so shouldn't be a noteworthy action.
If we view the goal to be ensuring certificates can be replaced at short notice (fullstop), do the identified Action Items adequately serve to prevent a recurrence of revocation being delayed? What Action Items could further progress towards such a goal?

(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #7)

The list of affected certificates was attached in the original bug report at https://bugzilla.mozilla.org/show_bug.cgi?id=1887008. Here is the link https://bugzilla.mozilla.org/attachment.cgi?id=9393049 for easy reference. It has been identified that a total of 46 distinct certificates must be revoked, after duplicated pre-certificate logs and final certificate logs were removed. The affected certificates were mainly issued to government bureaus or departments in Hong Kong SAR, financial institutions serving a critical electronic payment infrastructure of Hong Kong. We examined their conditions with respect to the exceptional circumstances laid down in Mozilla’s revocation guidelines (https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation), and were convinced that revoking the affected certificates within the prescribed deadline may cause a large impact on the sustainable service delivery of critical infrastructure.

One certificate was promptly revoked, but unfortunately, there was a delay in revoking the remaining 45 certificates due to a system issue. The issue is related to a bug in our certificate issuance system where the criticality flag of the basicConstraints extension would be inadvertently overridden by the CSR provided by the customer. When we implemented a system patch on 2024-03-25 to reject any CSR that contains a basicConstraints extension failing the zlint linting, it only allowed the issuance of certificates to unaffected customers. The bug in the certificate issuance system still hindered the affected customers from generating new certificates for replacement. The issue was not fixed until we received the tailor-made system patch from the vendor. After successful testing, it was implemented in production more than a month later. As a result, the affected subscribers were able to generate new certificates and replace the old ones. However, this entire process caused a delay in promptly revoking the affected certificates.

Questions for Hongkong Post CA:

  • when certificates were issued to these subscribers, was Hongkong Post CA aware of the BR requirement that misissued certificates needed to be revoked within 24 hours (limit 5 days)?
  • when certificates were issued to these subscribers, was Hongkong Post CA aware that these subscribers were planning to deploy those certificates to services for which operational disruption could lead to “substantial cumulative impacts on the sustainable delivery of critical e-services by our government subscribers”?
  • if the answers to those questions are “yes”, then why did Hongkong Post CA issue the certificates without assurances that the subscribers would be able to appropriately react to the specified timeline for revocation?
  • what does Hongkong Post CA communicate to prospective subscribers about the revocation requirements that Hongkong Post CA has agreed to uphold?
  • what steps is Hongkong Post CA taking to ensure that no further certificates are issued (including renewal) to subscribers whose limitations would lead Hongkong Post CA to delay revocation in any circumstance in the future?

These subscribers, who are government bureaus or departments in Hong Kong SAR serving critical government e-services and financial institutions serving a critical electronic payment infrastructure, have been using these certificates for many years. We were fully aware of the requirement in the BR that misissued certificates needed to be revoked within 24 hours (or within a limit of 5 days), and we had previously demonstrated our commitment to comply with this BR requirement in the rare instances when we mistakenly issued certain certificates.
We acknowledge that some of the subscribers may not fully understand or aware of this requirement that Hongkong Post CA has an obligation to uphold. Hence, we have identified additional action items to improve the situation. We are committed to follow the Mozilla’s revocation guidelines (https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation) in handling the revocation of the certificates. To this end, we have set out the plan on the following action items that aim to prevent reoccurrence of this incident (action items 1 - 3) and future revocation delays (action items 4).

Action Item Kind Due Date
1. Upgrade zlint to the latest version (3.6.2) Prevent 2024-05-31
2. Include “pkilint” as a pre-issuance linting tool in the certificate issuance process. Both zlint and pkilint will be used in parallel. Every certificate must pass both linting before issuance of certificate. Prevent 2024-06-07
3. Plan for upgrade of the certificate issuance system to the latest version that enforce all mandated configuration to the certificate, including the basicConstraint extension. Prevent 2024-06-30
4. Educate customers on the certificate revocation requirement, train them on the impact analysis to their e-services and facilitate their preparation for a swift certificate replacement process, and contingency planning for enforced certificate revocation to minimize disruptions to e-services. Prevent 2024-06-30
Flags: needinfo?(manho)

(In reply to Clint Wilson from comment #8)

(In reply to Man Ho from comment #5)

2024-05-17

  • 20:30 All affected certificates were revoked.
Action Item Kind Due Date
Continue to revoke the certificates that have been delayed in revocation. Mitigate DONE
Ensure customers are aware of possible situations where certificates must be replaced quickly for security reasons. Prevent DONE

I worry about the framing/focus of this second Action Item (as well as how it's measured). There are a myriad of reasons certificates may need to be replaced and doing so shouldn't be a noteworthy action.
If we view the goal to be ensuring certificates can be replaced at short notice (fullstop), do the identified Action Items adequately serve to prevent a recurrence of revocation being delayed? What Action Items could further progress towards such a goal?

As I mentioned above, we have set out the plan on new action items that aim to prevent reoccurrence of this incident (action items 1 - 3) and future revocation delays (action items 4). We genuinely value the community's input and would appreciate any feedback on these action items.

(In reply to Man Ho from comment #9)

One certificate was promptly revoked, but unfortunately, there was a delay in revoking the remaining 45 certificates due to a system issue. The issue is related to a bug in our certificate issuance system where the criticality flag of the basicConstraints extension would be inadvertently overridden by the CSR provided by the customer. When we implemented a system patch on 2024-03-25 to reject any CSR that contains a basicConstraints extension failing the zlint linting, it only allowed the issuance of certificates to unaffected customers. The bug in the certificate issuance system still hindered the affected customers from generating new certificates for replacement. The issue was not fixed until we received the tailor-made system patch from the vendor. After successful testing, it was implemented in production more than a month later. As a result, the affected subscribers were able to generate new certificates and replace the old ones. However, this entire process caused a delay in promptly revoking the affected certificates.

In my opinion, this is not an acceptable reason for delayed revocation and a gross abdication of Hongkong Post CA's commitments as part of the root programs. The subscribers could have had a certificate issued by a different CA if you were unable to issue them with correct certificates at that point in time, and you could have then revoked promptly.

Was this option discussed with subscribers? If not, why not? If so, why was it not pursued?

Additionally, what is Hongkong Post CA doing to ensure that they can get their issuing software fixed in a time-frame that allows them to meet the requirements of the BRs for revocation after misissuance? What will Hongkong Post commit to doing if it is discovered again that they have misissued a certificate due to a bug in their software?

Flags: needinfo?(manho)

(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #11)

Was this option discussed with subscribers? If not, why not? If so, why was it not pursued?

Additionally, what is Hongkong Post CA doing to ensure that they can get their issuing software fixed in a time-frame that allows them to meet the requirements of the BRs for revocation after misissuance? What will Hongkong Post commit to doing if it is discovered again that they have misissued a certificate due to a bug in their software?

When our major customers were made aware of the incident, we entered into discussions about this option. They have carefully evaluated the impact of this option and considered various factors such as the certificate application process, system changes that may involve, additional manpower requirements, internal policy and guidelines and so on. Eventually, they have not taken the option.

During the incident, we immediately made an urgent request to the vendor's management, emphasizing the need for prompt resolution as a top priority. We closely monitored the delivery of the tailor-made system patch and allocated additional resources for thorough system testing and regression testing upon its receipt ensuring the system patch could be applied at the earliest time. We recognize that the vendor commitment and capability in maintaining software agility to meet BR requirements is of paramount importance. To address this, while we plan for upgrade of the certificate issuance system to the latest version as a short-term solution (action item 3), we will also explore the possibility of changing the certificate issuance system and/or vendor for a more reliable and sustainable solution in the longer term. We are committed to follow Mozilla’s revocation guidelines for handling the revocation of mis-issued certificates.

Flags: needinfo?(manho)

(In reply to Man Ho from comment #12)

During the incident, we immediately made an urgent request to the vendor's management, emphasizing the need for prompt resolution as a top priority. We closely monitored the delivery of the tailor-made system patch and allocated additional resources for thorough system testing and regression testing upon its receipt ensuring the system patch could be applied at the earliest time.

And yet, it took a month. That you pushed the vendor so hard in this case and it took a month is not a point in favour of your ability (really willingness) to meet your commitments.

Ultimately, though, it would have been the same problem if you’d had in-house software that took a month to fix, so this is more an expression of concern for your sake than a material concern. You assume the risk for using a vendor’s software and not having an agreement with them that ensures that you get fixes in the timelines required by your commitments to the BRs, and that is a perfectly legitimate choice for you to make. What is not acceptable is for you to push that risk onto the users of the WebPKI through willful non-conformance.

We are committed to follow Mozilla’s revocation guidelines for handling the revocation of mis-issued certificates.

What is different about your commitment now, versus at the beginning of the underlying incident? Were you not committed to following the revocation guidelines then?

[Edit: the remainder of this comment was in error, and I apologize for it.]

~~But also, you quite conspicuously did not answer these questions from the comment you replied to:

(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #11)

Was (having subscribers obtain replacement certificates from another CA that was capable of issuing correct certificates) discussed with subscribers? If not, why not? If so, why was it not pursued?

I think that failure to do that, and instead keeping invalid certificates live for a month, is a very serious issue and I think Hongkong Post should explain their reasoning very clearly—keeping in mind that their first responsibility as a CA is to the WebPKI, and not to the convenience of their customers, or to them keeping those customers.

I have bolded my questions in this comment to make it easier for you to ensure that you respond to all of them.~~

Flags: needinfo?(manho)

I owe an apology, I clearly missed the entire first paragraph of your response.

(In reply to Man Ho from comment #12)

(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #11)

Was this option discussed with subscribers? If not, why not? If so, why was it not pursued?
sissued a certificate due to a bug in their software?

When our major customers were made aware of the incident, we entered into discussions about this option. They have carefully evaluated the impact of this option and considered various factors such as the certificate application process, system changes that may involve, additional manpower requirements, internal policy and guidelines and so on. Eventually, they have not taken the option.

Were the subscribers informed about your duty to revoke under the BRs? Did you set a maximum time that you would wait for a fix before revoking without being able to issue a replacement? Would you have waited two months to revoke? A year?

If the alternative to switching to another CA is “nothing happens” then obviously it is less work and disruption for them to just wait. They made a legal agreement to tolerate immediate revocation, though, and the purpose of that agreement is to protect CAs such that they can do their agreed duty to uphold the BRs. Instead it seems that Hongkong Post decided to disregard the crystal-clear intent of the BRs, putting the convenience of their subscribers ahead of their responsibilities to the WebPKI.

So only the most important question remains, I think:

What is different about your commitment now, versus at the beginning of the underlying incident? Were you not committed to following the revocation guidelines then?

And, why was a detailed list of subscribers with the rationale for an “exceptional” delay in revocation not provided? The BRs require more than a mere certificate list and summary of some subscribers’ preference to not expend “additional manpower”, in my opinion.

Flags: needinfo?(manho)

I just wanted to drop a quick note to say that we've received your questions and we're currently going through them. I will strive to provide you with our responses as soon as possible.

Flags: needinfo?(manho)

What is different about your commitment now, versus at the beginning of the underlying incident? Were you not committed to following the revocation guidelines then?

Hongkong Post CA has been committed to following the BRs to protect the security and integrity of WebPKI. In this incident, the remediation of the software bug by our software vendor has failed to meet the agreed service level and caused prolonged delay. We and our vendor will take this as a lesson learnt and dedicate resources to address system error with the highest priority, ensuring the timely revocation of the affected certificates in the future.

We have set out the plan on the following action items that aim to prevent reoccurrence of this incident (action items 1 - 3) and future revocation delays (action items 4 - 6). With the implementation of these actions, we are confident that we will be able to fulfill the BR requirements going forward.

Action Item Kind Due Date
1. Upgrade zlint to the latest version (3.6.2) Prevent 2024-05-31
2. Include “pkilint” as a pre-issuance linting tool in the certificate issuance process. Both zlint and pkilint will be used in parallel. Every certificate must pass both linting before issuance of certificate. Prevent 2024-06-07
3. Plan for upgrade of the certificate issuance system to the latest version that enforce all mandated configuration to the certificate, including the basicConstraint extension. Prevent 2024-06-30
4. Educate customers on the certificate revocation requirement, train them on the impact analysis to their e-services and facilitate their preparation for a swift certificate replacement process, and contingency planning for enforced certificate revocation to minimize disruptions to e-services. Prevent 2024-06-30
New
5. Update the CA operation procedure for the swift certificate replacement and the enforced revocation to ensure adherence to Mozilla's revocation guidelines. Prevent 2024-06-30
6. Incorporating this BR requirement into our risk management plan to effectively manage crises resulting from enforced revocation that could potentially cause significant harm to critical infrastructure or essential e-services. Prevent 2024-07-31

And, why was a detailed list of subscribers with the rationale for an “exceptional” delay in revocation not provided? The BRs require more than a mere certificate list and summary of some subscribers’ preference to not expend “additional manpower”, in my opinion.

We will make improvements to our incident reporting process. Please find a detailed list of affected certificates along with the reasons for the delays in the attachment.

Flags: needinfo?(manho)

There needs to be a serious discussion on who holds liability for the continued issuance of certificates held in 'Critical infrastructure' noted in your attached file. It is the CA's job to issue certificates per the Baseline Requirements and Root Program policies, and then to uphold the regulations as-is including revocation within the required timeframe. The concept of resilient infrastructure is more than offering certificates, but making sure that subscribers are aware that certificates must be replaced without delay if issues are noticed.

To that end, if a third-party has decided to put lives and significant monetary damages at risk then Hongkong Post has a moral imperative to not encourage these practices by giving them certificates going forward. All that a reissued certificate is saying is that Hongkong Post are aware of the situation and implicitly agree that this is normal operating practices that should be encouraged.

I say Hongkong Post as there is a conflict at issue here where we're really talking to Certizen Ltd in this incident who are working a tender for operating and maintaining CA services on behalf of Hongkong Post. The management decisions to allow these brittle critical services to exist is risking far more serious damages when a more significant issue occurs and we find out the certificate is being used in a way that makes it impossible to replace within 24 hours. Of the parties involved only one should be figuring out the acceptable risk, and it's not the CA.

To that end were a certificate report problem to arrive tomorrow advising of that the listed delayed revocation certificates need replaced within 24 hours, would it even be possible?

Flags: needinfo?(manho)

Under the terms of the subscriber agreement, Hongkong Post CA was granted the authority to revoke the certificates if the certificates are not properly issued in accordance with the CA/Browser Forum baseline requirements. During our coordination with the subscribers, they are reminded of the certificate revocation requirements that both subscribers and CAs are obliged to uphold. Besides, we have drawn up a list of action items, including action item #4, aiming to minimize the time it takes for subscribers to replace certificates. We will educate the subscribers, enhance their understanding of immediate revocation requirements, and facilitate them in preparing for a swift certificate replacement process, to ensure that the certificates can be replaced within the revocation deadline in case needed. Additionally, subscribers will also be advised to consider utilizing private PKI, and prepare other contingency plans for enforced certificate revocation to minimize disruptions to their e-services. With the implementation of these actions, we are confident that we will be able to fulfill the BR requirements going forward.

Flags: needinfo?(manho)

2024-05-30

  • 19:00 The “pkilint” has been included as a pre-issuance linting tool in the certificate issuance process. An upgrade to version 3.6.2 has been made for the existing "zlint" tool. Both zlint and pkilint will be used in parallel. Every certificate must pass both linting before issuance of certificate.

Here below a status update of the action items:

Action Item Kind Due Date
1. Upgrade zlint to the latest version (3.6.2) Prevent DONE
2. Include “pkilint” as a pre-issuance linting tool in the certificate issuance process. Both zlint and pkilint will be used in parallel. Every certificate must pass both linting before issuance of certificate. Prevent DONE
3. Plan for upgrade of the certificate issuance system to the latest version that enforce all mandated configuration to the certificate, including the basicConstraint extension. Prevent 2024-06-30
4. Educate customers on the certificate revocation requirement, train them on the impact analysis to their e-services and facilitate their preparation for a swift certificate replacement process, and contingency planning for enforced certificate revocation to minimize disruptions to e-services. Prevent 2024-06-30
5. Update the CA operation procedure for the swift certificate replacement and the enforced revocation to ensure adherence to Mozilla's revocation guidelines. Prevent 2024-06-30
6. Incorporating this BR requirement into our risk management plan to effectively manage crises resulting from enforced revocation that could potentially cause significant harm to critical infrastructure or essential e-services. Prevent 2024-06-30

Here below a status update of the outstanding action items.

Action Item Kind Due Date
3. Plan for upgrade of the certificate issuance system to the latest version that enforce all mandated configuration to the certificate, including the basicConstraint extension. Prevent DONE
4. Educate customers on the certificate revocation requirement, train them on the impact analysis to their e-services and facilitate their preparation for a swift certificate replacement process, and contingency planning for enforced certificate revocation to minimize disruptions to e-services. Prevent 2024-06-30
5. Update the CA operation procedure for the swift certificate replacement and the enforced revocation to ensure adherence to Mozilla's revocation guidelines. Prevent 2024-06-30
6. Incorporating this BR requirement into our risk management plan to effectively manage crises resulting from enforced revocation that could potentially cause significant harm to critical infrastructure or essential e-services. Prevent 2024-06-30

We have been making progress on the action items #4, #5 and #6. I will strive to promptly share the status of these action items, aiming to provide an update no later than 2024-06-30.

In bug 1886665 comment 22 you said,

... we are fully committed to complying with the baseline requirement in cases where certificate revocation is necessary within either 24 hours or 5 days, the deadlines provided by the BRs. The action items #6 and #7 are intended to raise awareness among subscribers about the need to be prepared for these cases, but not to undermine our commitment.

I am happy to see Hongkong Post making this public commitment to follow mandated revocation timelines in the future for any misissuance incident regardless of Subscriber behavior. As this statement is a general policy statement, please confirm it applies to the circumstances of this incident also.

In bug 1886665 comment 24 I suggest a new action item that codifies this commitment in a hard policy, including current and ongoing employee education of the policy. You should add that action item to this incident as well.

(In reply to Man Ho from comment #21)

Here below a status update of the outstanding action items.

Action Item Kind Due Date
4. Educate customers on the certificate revocation requirement, train them on the impact analysis to their e-services and facilitate their preparation for a swift certificate replacement process, and contingency planning for enforced certificate revocation to minimize disruptions to e-services. Prevent 2024-06-30

I'm impressed that you will be done educating customers in just a few weeks. What is the expected outcome of that education? How will you know that it was successful?

Also, how will you educate new subscribers about the potential for short-notice revocation? I assume that your terms and conditions already inform them of this, as required by the BRs--is there additional information required for the subscribers here?

Finally, I want to reiterate that educating customers should not have any impact on Hongkong Post's ability to comply with the BR requirements for revocation. Customers can make choices that have an effect on what the operational impact is for them when a certificate is revoked, but if they do not choose to mitigate that appropriately, Hongkong Post is still responsible for revoking appropriately when they have misissued certificates.

I also want to review this earlier comment from Hongkong Post for emphasis:

(In reply to Man Ho from comment #16)

Hongkong Post CA has been committed to following the BRs to protect the security and integrity of WebPKI. In this incident, the remediation of the software bug by our software vendor has failed to meet the agreed service level and caused prolonged delay.

Neither the software bug nor the software vendor's timeline for a fix are the cause of this incident. This incident was caused solely by Hongkong Post's unwillingness to meet their commitments to the BRs and root programs. If a misissuance due to software bug happens tomorrow, and the software vendor says that the fix will take two weeks, will Hongkong Post revoke within the prescribed timelines (24hr/120hr)? That is the most important commitment to make, and it is the only thing that can actually prevent a repeat of this incident.

Finally, Mozilla's delayed revocation incident response expectations include the following:

  • The decision and rationale for delaying revocation will be disclosed in the form of a preliminary incident report immediately; preferably before the BR-mandated revocation deadline. The rationale must include detailed and substantiated explanations for why the situation is exceptional. Responses similar to “we do not deem this non-compliant certificate to be a security risk” are not acceptable. When revocation is delayed at the request of specific Subscribers, the rationale must be provided on a per-Subscriber basis.
  • Your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable.

In my opinion, this incident's list of affected certificates does not have a sufficiently-detailed per-Subscriber description of the rationale, which should include details about the unacceptable effects that would occur if the certificates were revoked on time. Will Hongkong Post provide this information?

And: Can you please provide some detail as to how you worked with your auditor and any applicable root programs to ensure that your risk analysis was acceptable?

[Edited to clarify per-subscriber detail request.]

Flags: needinfo?(manho)

(In reply to Tim Callan from comment #22)

I am happy to see Hongkong Post making this public commitment to follow mandated revocation timelines in the future for any misissuance incident regardless of Subscriber behavior. As this statement is a general policy statement, please confirm it applies to the circumstances of this incident also.

Yes, we are fully committed to complying with the baseline requirement in cases where certificate revocation is necessary within either 24 hours or 5 days, the deadlines provided by the BRs.

In bug 1886665 comment 24 I suggest a new action item that codifies this commitment in a hard policy, including current and ongoing employee education of the policy. You should add that action item to this incident as well.

As part of our action item #5, we are in the process of updating the CA operation procedure. This includes two main areas: (i) the swift certificate replacement, and (ii) the enforced revocation. These updates have been introduced following the analysis of two incidents involving delayed revocation (bug 1886665 and bug 1887888). At Hongkong Post CA, our officers are required to strictly follow the operational procedure as stated in our policy including that for enforced revocation, and they will receive regular training to ensure their compliance with these procedures.

To provide more clarity, let’s revise the description of action item #5 as follow:

Action Item Kind Due Date
5. Update the CA operation procedure for (i) the swift certificate replacement and (ii) the enforced revocation to ensure adherence to BR revocation requirement. Prevent 2024-06-30
6. Incorporating this BR requirement into our risk management plan to effectively manage crises resulting from enforced revocation that could potentially cause significant harm to critical. Prevent 2024-06-30
Flags: needinfo?(manho)

(In reply to Mike Shaver (:shaver emeritus) from comment #23)

I'm impressed that you will be done educating customers in just a few weeks. What is the expected outcome of that education? How will you know that it was successful?

Our target is to deliver educational materials to all customers, including new customers as well as the affected and unaffected customers by this incident, by 2024-06-30 regarding the BR revocation requirements. We will treat education as an ongoing process aiming to facilitate our customers to perform early planning and preparation for swift certificate replacement and enforced revocation. We will approach our customers and collect their feedback to confirm their understanding.

If a misissuance due to software bug happens tomorrow, and the software vendor says that the fix will take two weeks, will Hongkong Post revoke within the prescribed timelines (24hr/120hr)?

We are fully committed to revoke the misissued certificates within the prescribed timeline (24hr/120hr) ensuring strict adherence to the BR's revocation requirement, regardless of the software vendor's ability to resolve the bug in a timely manner.

In my opinion, this incident's list of affected certificates does not have a sufficiently-detailed per-Subscriber description of the rationale, which should include details about the unacceptable effects that would occur if the certificates were revoked on time. Will Hongkong Post provide this information?

It was determined that the affected customers are serving critical government e-services or public e-services such as public hospitals, essential community health service, critical financial service or electronic payment infrastructure of Hong Kong. If these certificates were revoked at that time, it could cause their critical e-service outage and lead to substantial cumulative impacts to local users.

And: Can you please provide some detail as to how you worked with your auditor and any applicable root programs to ensure that your risk analysis was acceptable?

We have already informed our auditor about the incidents so that they can be included in the upcoming Webtrust audit report. If any non-conformities, qualifications, or modified opinions are found during the audits, we will create audit incident reports as required by CCADB. The Webtrust audit report will be submitted to CCADB for review by Root Store Operators. In the meantime, we welcome any questions from Root Store Operators regarding this incident report. We will follow up within this incident report itself.

(In reply to Man Ho from comment #25)

(In reply to Mike Shaver (:shaver emeritus) from comment #23)

I'm impressed that you will be done educating customers in just a few weeks. What is the expected outcome of that education? How will you know that it was successful?

Our target is to deliver educational materials to all customers, including new customers as well as the affected and unaffected customers by this incident, by 2024-06-30 regarding the BR revocation requirements. We will treat education as an ongoing process aiming to facilitate our customers to perform early planning and preparation for swift certificate replacement and enforced revocation. We will approach our customers and collect their feedback to confirm their understanding.

No, I am asking: how will you know that it is effective at preventing delayed revocation from occurring due to subscriber operational limitations? It doesn’t matter now well they understand it, what matters is how Hongkong Post will act in the case of future incidents. Subscribers don’t revoke certificates, CAs do. What about this education changes how Hongkong Post will make decisions in the future?

In my opinion, this incident's list of affected certificates does not have a sufficiently-detailed per-Subscriber description of the rationale, which should include details about the unacceptable effects that would occur if the certificates were revoked on time. Will Hongkong Post provide this information?

It was determined that the affected customers are serving critical government e-services or public e-services such as public hospitals, essential community health service, critical financial service or electronic payment infrastructure of Hong Kong. If these certificates were revoked at that time, it could cause their critical e-service outage and lead to substantial cumulative impacts to local users.

I am asking about per-subscriber detail. A summary of the overall analysis is not the same thing. “It would be bad” is not a cogent analysis of risk on a per-subscriber basis, and does not describe how Hongkong Post incorporated risk to the web PKI from non-compliance into its analysis.

Why has Hongkong Post repeatedly refused to provide this information, as clearly required by Mozilla policies and as requested specifically in this incident?

And: Can you please provide some detail as to how you worked with your auditor and any applicable root programs to ensure that your risk analysis was acceptable?

We have already informed our auditor about the incidents so that they can be included in the upcoming Webtrust audit report.

Please answer the question. The Mozilla delayed revocation incident response policy clearly requires consultation with auditors and root programs to determine appropriateness of analysis and remediation plan, as part of the determination to delay revocation. I asked how you worked (past tense) with those parties, as described in the Mozilla policy, not about future audits.

Flags: needinfo?(manho)

2024-06-28:

  • 15:42 Educational materials regarding the BR revocation requirements and, in addition, a request for secondary contact information for timely communication have been distributed to all customers, including new customers as well as those affected and unaffected by this incident.

Additionally, we have also prepared new CA operation procedures for (i) the swift certificate replacement and (ii) the enforced revocation to ensure adherence to BR revocation requirement stated in action item #5. At Hongkong Post CA, our officers are required to strictly follow these operation procedures, and they will receive regular training to ensure their compliance with these procedures. Furthermore, our risk management plan has been updated to incorporate the BR revocation requirement mentioned in action item #6, enabling us to effectively manage crises resulting from enforced revocation.

Here below a status update of the outstanding action items.

Action Item Kind Due Date
4. Educate customers on the certificate revocation requirement, train them on the impact analysis to their e-services and facilitate their preparation for a swift certificate replacement process, and contingency planning for enforced certificate revocation to minimize disruptions to e-services. Prevent DONE
5. Update the CA operation procedure for (i) the swift certificate replacement and (ii) the enforced revocation to ensure adherence to BR revocation requirement. Prevent DONE
6. Incorporating this BR requirement into our risk management plan to effectively manage crises resulting from enforced revocation that could potentially cause significant harm to critical. Prevent DONE

By the way, we keep monitoring this bug for further comments or questions. We will follow up within this bug itself.

(In reply to Mike Shaver (:shaver emeritus) from comment #26)

I just wanted to drop a note to say that we've received your questions and we're currently going through them. I will strive to provide you with our responses by 2024-07-03.

Attachment #9404302 - Attachment is obsolete: true

(In reply to Mike Shaver (:shaver emeritus) from comment #26)

No, I am asking: how will you know that it is effective at preventing delayed revocation from occurring due to subscriber operational limitations? It doesn’t matter now well they understand it, what matters is how Hongkong Post will act in the case of future incidents. Subscribers don’t revoke certificates, CAs do. What about this education changes how Hongkong Post will make decisions in the future?

Action item #4 regarding this education aims to raise awareness among subscribers about the need to be prepared for cases where certificate revocation is required within either 24 hours or 5 days, but not to undermine our commitment to complying with the baseline requirement. Therefore, we will not consider subscriber operational limitation as an acceptable reason to delay revocation.

I am asking about per-subscriber detail. A summary of the overall analysis is not the same thing. “It would be bad” is not a cogent analysis of risk on a per-subscriber basis, and does not describe how Hongkong Post incorporated risk to the web PKI from non-compliance into its analysis.

Why has Hongkong Post repeatedly refused to provide this information, as clearly required by Mozilla policies and as requested specifically in this incident?

The determination was made on a per-subscriber basis, as outlined in the newly attached file, for each reason for delays. If these certificates were revoked at that time, it would have prevented local users from accessing a wide range of essential government e-services, public hospitals, community health services, critical financial services, and electronic payment services. The per-certificate details are updated in attachment.

However, in order to enhance our incident reporting process, we value the feedback from the community regarding the specific requirements that should be included in an incident report. This will enable us to provide detailed information about the potential impact of certificate revocation at an early stage in future, while still maintaining our primary focus on ensuring no delays in the revocation process.

Please answer the question. The Mozilla delayed revocation incident response policy clearly requires consultation with auditors and root programs to determine appropriateness of analysis and remediation plan, as part of the determination to delay revocation. I asked how you worked (past tense) with those parties, as described in the Mozilla policy, not about future audits.

After determining that we needed to create this incident report regarding the delayed revocation, we included our analysis and remediation action plan within this incident report, which has been shared with our auditor. As the matter is still open for public discussion, it was agreed to maintain its inclusion within this incident report.

In an email communication with Chrome Root Program on 2024-03-20 regarding the original bug #1886406, I was informed that if they have any questions or concerns, they will follow up within this incident report. With this understanding, I assumed that the root programs would also follow up on this incident report.

We welcome any questions from root programs regarding this incident report. We will follow up within this incident report.

Flags: needinfo?(manho)

I would like to drop a note to say that we continue monitoring this bug for further comments or questions. We will follow up within this bug itself.

Blocks: 1911183
Whiteboard: [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31

Acknowledged that this bug has been marked for next update 2024-10-31. We continue monitoring this bug for further comments or questions. We will follow up here.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-10-31 → [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30

We continue work on incident-reporting and compliance requirements aimed at reducing delayed revocation, so this bug will remain open until at least February 1, 2025. Meanwhile, CAs should review https://github.com/mozilla/www.ccadb.org/pull/186.

Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2024-11-30 → [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-02-01 → [ca-compliance] [leaf-revocation-delay] Next update 2025-03-03

Before closing this incident, Hong Kong Post should repeat its commitment to revoke TLS certificates timely in accordance with section 4.9.1 of the TLS Baseline Requirements.

Mozilla acknowledges that some of Hong Kong Post’s subscribers operate under complex regulatory or bureaucratic constraints. Still, Hong Kong Post will need to provide additional Action Items aimed at handling finance-related and government-managed subscribers and ensuring that no external policies prevent timely revocation. Examples include: requiring government and finance-related entities to provide written confirmation that they can comply with revocation timelines before issuance; ensuring that they have plans for replacing certificates within 24 hours of a misissuance or security incident; and streamlined approval processes so that TLS certificates can be replaced without problematic bureaucratic approval chains.

Finally, we will need a completed Closure Summary.

Flags: needinfo?(manho)

Through the experience of dealing with mass revocation, we have gained insights on how to handle such situations better. We would like to reiterate our firm commitment adhering to the baseline requirement for timely certificate revocation in accordance with section 4.9.1 of the TLS Baseline Requirements.

We wish to provide an update on the follow-up actions that were implemented last year and have since been established as part of our application procedure to prevent any external policies from obstructing timely revocation. The enumerated actions are outlined below:

Action Item Kind Due Date
7. All applicants, including government and finance-related entities, must sign the application form. The form states that HKPCA will revoke the TLS certificate within 24 hours or 5 days under the conditions stipulated in section 4.9.1 of the CPS, in alignment with section 4.9.1 of the TLS Baseline Requirements. This act signifies their agreement to the timely revocation of the TLS certificate as mandated by section 4.9.1 of the TLS Baseline Requirements. Prevent 2024-06-30 DONE
8. A reminder on the effective management of TLS certificates was sent to all government-related entities last year, ensuring that they have plans for replacing certificates within 24 hours of a misissuance or security incident and streamlining their approval processes as well. Receive confirmation of understanding from all the government-related entities on this TLS Baseline Requirement. Prevent 2025-01-10 DONE

We will prepare a Closure Summary if there are no further questions.

🥹

Flags: needinfo?(manho)

Incident Report Closure Summary

Incident Description:

  • Hongkong Post CA issued a total of 46 TLS certificates with basicConstraints not marked as critical, affecting government bureaus and departments and financial institutions in Hong Kong SAR. Despite efforts to collaborate with affected subscribers for certificate replacement, 45 certificates were not revoked within 5 days as per TLS BR. If these certificates are revoked without proper coordination and assured completion of replacement TLS certificates by the affected subscribers, it could lead to substantial cumulative impacts on the sustainable delivery of critical e-services by the government.

Incident Root Cause(s):

  • The delay in revocation was primarily attributed to the manual management of affected certificates by individual subscribers and lack of unified solutions for certificate management among major subscribers. Additionally, a delay in the system vendor patch to address an issue where the criticality flag of the basicConstraints extension was overridden by customer-provided CSR, hindered timely replacement. Consequently, the entire process caused a delay in promptly revoking the affected certificates.

Remediation Description:

  • To address the issues and prevent future incidents, we have taken action items including upgrading linting tools, strengthening the staff training, educating subscribers on revocation requirements, and applying system patches to enforce mandated configurations. Efforts have also been made to enhance operational procedures for swift certificate replacement and enforced revocation ensuring the compliance with the BR revocation requirements.

Commitment Summary:

  • We are committed to comply with the baseline requirement for timely certificate revocation in accordance with section 4.9.1 of the TLS Baseline Requirements. We are also committed to preventing recurrence of similar incidents and ensuring compliance with industry standards of CCADB, web browsers and CA/Browser Forum to maintain trust within the WebPKI community.

Since all the action items mentioned in this Incident Report have been taken care of, we kindly ask for it to be closed.

Flags: needinfo?(bwilson)

I will close this on Friday, 28-Feb-2025, unless there are questions or issues to discuss.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: