Open Bug 1532559 Opened 7 months ago Updated 4 days ago

CFCA: Wrong SerialNumber encoding

Categories

(NSS :: CA Certificate Compliance, task)

task
Not set

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: michel, Assigned: jonathansshn, NeedInfo)

Details

(Whiteboard: [ca-compliance] )

User Agent: Mozilla/5.0 (Android 9; Mobile; rv:65.0) Gecko/65.0 Firefox/65.0

Steps to reproduce:

I found this certificate issued by CFCA that seems to have an issue with the SerialNumber encoding:
https://crt.sh/?id=575911133&opt=cablint,zlint

Jonathan, Please look into this issue, and provide an incident report as described here:
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Incident_Report

Assignee: wthayer → jonathansshn
QA Contact: kwilson → wthayer
Whiteboard: [ca-compliance]
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(jonathansshn)

This certificate was marked revoked on 2019-06-29. However, there is still no incident report, four months on.

Emailed POCs on 2019-07-04 regarding this issue, highlighting https://wiki.mozilla.org/CA/Responding_To_An_Incident#Keeping_Us_Informed

(In reply to Kathleen Wilson from comment #1)

Jonathan, Please look into this issue, and provide an incident report as described here:
https://wiki.mozilla.org/CA/Responding_To_An_Incident#Incident_Report

  1. Problem Report:
    CFCA recognized this problematic certificate via a report from Michel Le Bihan’s email on March 5, 2019.
  2. Timeline:
    June, 5, 2018: We noticed that there might be some problems with the SerialNumber encoding format of the certificate, and later these suspicions were confirmed by our R&D staff.
    June 21, 2018: Our R&D department immediately modified this problem.
    August, 2018: We update the CA system to fix the problem.
  3. Statement
    CFCA had stopped issuing certificates with the problem.
  4. Summary
    CFCA only issued one certificate of the wrong encoding of SerialNumber and this certificate has been revoked
  5. Certificate Data:
    Please visit https://crt.sh/?id=575911133&opt=cablint,zlint to check the data.
  6. Explanation:
    This problem is due to the bug of CA and we have fixed it before we are informed.
  7. Steps:
  8. We update CA system to fixing the function and this had been finished in August, 2018.

Thank you for providing the incident report. If CFCA knew about this misissuance back in 2018, please explain why it wasn't reported and revoked at that time, and what CFCA will do to prevent failure to report future misissuances? Also, since the failure to revoke is a violation of BR section 4.9.1, I will expect this issue to appear on CFCA's next audit report. Has CFCA notified your auditor?

Flags: needinfo?(sunny_bxl)

(In reply to Wayne Thayer [:wayne] from comment #5)

Thank you for providing the incident report. If CFCA knew about this misissuance back in 2018, please explain why it wasn't reported and revoked at that time, and what CFCA will do to prevent failure to report future misissuances? Also, since the failure to revoke is a violation of BR section 4.9.1, I will expect this issue to appear on CFCA's next audit report. Has CFCA notified your auditor?

Hi, Wayne
 
   When the problem occurred, our R&D department examined it and confirmed that it was caused by UTF-8 character. We identified the problem as a system bug, so we didn't report it as an incident report. After issuing the wrong certificate, we contacted with China construction bank, They told us that the certificate could only be changed in the next window period, so it was not revoked immediately.

Since we havn't submitted the report before and we are not clear about the report level and scope. I apologize for not submitting the report on time. We’ll learn more about the reporting process to prevent failure to report future misissuances.

Every year, CFCA will informs auditor of our updates to the system, we had notified our auditor about this issue. we will submit these issues and reports to them in the coming audit.

Flags: needinfo?(sunny_bxl)

Oliver:

Thanks for providing additional details. However, I do want to highlight that there's a difference here between delaying revocation and seemingly not taking any steps to revoke. It's unclear if the "next window period" from 2018-06-05 is 2019-06-29, but waiting a year to delay revoking a certificate is never acceptable.

Part of Wayne's question in Comment #5 is trying to understand what steps are being taken to

  1. Ensure this issue does not repeat
  2. Ensure that revocation is never delayed

For example, CAs are required to include within their Subscriber Agreements/Terms of Use conditions to ensure that they can revoke certificates in the BR-required time, thus ensuring that they do revoke such certificates in time, regardless of customer, change window, or any other external factor.

  • Do your Subscriber Agreements/Terms of Use include such clauses, as required by the BRs?
  • What steps will you take to ensure they are adhered to in the follow? For example:
    • As has been pointed out to and by other CAs, communicating to all of your Subscribers to remind them of the all CAs' obligations to revoke, to ensure their customers are aware that CAs cannot delay revocation.
    • Supporting forms of automated issuance, so that even if a certificate needs to be revoked, it can be easily replaced.
    • Depending on the validation method used, potentially ensuring multiple validation methods are used or are usable, such that a compliance issue in one validation method does not impact the other.
  • Since you responded that it was seen as a system issue, have any and all system issues in the past year been disclosed within your Management's Assertion portion of your audit reports? If not, will you examine your records:
    • Determine if there were other system issues
    • Determine if they are incidents (optional). If you're unsure whether or not a system issue represents an incident report, the best way for the community is to err on the side of caution and assume that they are.
    • File Incident Reports for these issues to ensure an appropriate level of transparency
    • Ensure your auditor is aware of these issues and that Management's discloses these within its assertion (WebTrust has example illustrative documents)
Flags: needinfo?(sunny_bxl)

(In reply to Ryan Sleevi from comment #7)

Oliver:

Thanks for providing additional details. However, I do want to highlight that there's a difference here between delaying revocation and seemingly not taking any steps to revoke. It's unclear if the "next window period" from 2018-06-05 is 2019-06-29, but waiting a year to delay revoking a certificate is never acceptable.

Part of Wayne's question in Comment #5 is trying to understand what steps are being taken to

  1. Ensure this issue does not repeat
  2. Ensure that revocation is never delayed

For example, CAs are required to include within their Subscriber Agreements/Terms of Use conditions to ensure that they can revoke certificates in the BR-required time, thus ensuring that they do revoke such certificates in time, regardless of customer, change window, or any other external factor.

  • Do your Subscriber Agreements/Terms of Use include such clauses, as required by the BRs?
  • What steps will you take to ensure they are adhered to in the follow? For example:
    • As has been pointed out to and by other CAs, communicating to all of your Subscribers to remind them of the all CAs' obligations to revoke, to ensure their customers are aware that CAs cannot delay revocation.
    • Supporting forms of automated issuance, so that even if a certificate needs to be revoked, it can be easily replaced.
    • Depending on the validation method used, potentially ensuring multiple validation methods are used or are usable, such that a compliance issue in one validation method does not impact the other.
  • Since you responded that it was seen as a system issue, have any and all system issues in the past year been disclosed within your Management's Assertion portion of your audit reports? If not, will you examine your records:
    • Determine if there were other system issues
    • Determine if they are incidents (optional). If you're unsure whether or not a system issue represents an incident report, the best way for the community is to err on the side of caution and assume that they are.
    • File Incident Reports for these issues to ensure an appropriate level of transparency
    • Ensure your auditor is aware of these issues and that Management's discloses these within its assertion (WebTrust has example illustrative documents)

After we know the problem, we made communication with our customer, told the customer about this problem and hope to revoked timely, but they feedback that the certificate has been deployed on a very important system and they think it doesn’t affect usage, they are not allowed to change in the short term. Our staff thought it wasn’t a serious accident at the moment, because we mainly serve Chinese enterprises, and most systems need to support Chinese characters, so we used to use UTF-8 characters. So we didn't revoked the certificates timely.
This issue has been resolved and won't occur again theoretically, but for the sake of caution, we will conduct a series of in-depth self-inspections, including systems, business processes and rules, further determine if there are any other issues. In the future, before the customers applying the certificates, we will remind them to read the agreement, ensure that they clearly understand the CAs' obligations to revoke and remind our customers to make preparations in advance.
We have started the construction of automatic audit and issuance system, to enhance the efficiency and fault tolerance mechanism of the system. After assessment, it will take a long time, it may be completed before the end of this year optimistically.
We had notified our auditor about this issue and we will submit these issues and reports to them in the coming audit.

Flags: needinfo?(sunny_bxl)

Thanks. This explains a bit more about the revocation delays, and the steps being taken to address revocation delays in the future. This addresses the questions raised in Comment #5 about the delays in revocation.

As it relates to the issue with the originally reported certificate, I'm still having trouble understanding the picture about how this issue happened or how it's being prevented.

For example, Comment #4 / Comment #8 talk about it in the context of SerialNumber. However, if you look at the link provided, you will also see that there are also encoding issues in the id-evat-jurisdiction-countryName field being reported there, and I don't see a similar acknowledgement or report here. Will you be addressing that issue in this bug, including an analysis of what is wrong, why it's wrong, how it happened, and what steps, both technical and process-wise, are being taken to address the underlying issues that contributed to it being issued?

Flags: needinfo?(sunny_bxl)

(In reply to Ryan Sleevi from comment #9)

Thanks. This explains a bit more about the revocation delays, and the steps being taken to address revocation delays in the future. This addresses the questions raised in Comment #5 about the delays in revocation.

As it relates to the issue with the originally reported certificate, I'm still having trouble understanding the picture about how this issue happened or how it's being prevented.

For example, Comment #4 / Comment #8 talk about it in the context of SerialNumber. However, if you look at the link provided, you will also see that there are also encoding issues in the id-evat-jurisdiction-countryName field being reported there, and I don't see a similar acknowledgement or report here. Will you be addressing that issue in this bug, including an analysis of what is wrong, why it's wrong, how it happened, and what steps, both technical and process-wise, are being taken to address the underlying issues that contributed to it being issued?

hi, Ryan:

Sorry for the late reply, i'm busy on our inner audit, this work is still in progress.
The reason for the issue is that we used to use UTF-8 charges and there are some space character in it, it's hard to be find by human.
When the issue occur, we read RFC 5280 again, then we modify the serial number and jurisdiction of incorporation CountryName in accordance with the requirements for RFC 5280. We update CA, add check of space character, this issue has been fixed now.
We'll making more attention on the changes of RFC 5280 and other requirements. Thanks for you help.

Flags: needinfo?(sunny_bxl)

Thank you for acknowledging the issue. However, as a response, it does not really demonstrate why the issue occurred, and thus does not really provide assurance that similar issues won’t occur.

Understanding how something like this occurred is extremely important to understanding the steps being taken. At best, the current response feels like “We made a mistake, we fixed it, and we won’t make mistakes again”.

A useful explanation is to talk about how the system was designed, what controls you had then to ensure compliance, what teams were involved in reviewing the code and implementation, and what’s being done. Explaining more about what CA software you use, how it is designed and maintained, and how you approach compliance with 5280.

This isn’t about assigning blame or attempting to shame. It’s about trying to understand the full picture of how this happened, so we can have confidence it won’t happen again, and so we can help ensure other CAs do not make the same or similar mistakes. We want to learn from this issue, and we want to be confident that there’s a clear understanding about how this happened.

Given that RFC 5280 and the Baseline Requirements spell out the technical requirements very clearly, understanding how this was missed - not just that it was missed - is key. The only explanation we have to date is that folks thought UTF would be good. However, if the level of controls in the CA software are that fluid, and if trying to improve things for local users that important, how can we be assured that, say, sometime in the future someone might be handed a sub-CA cert? Maybe due to someone misconfiguring a profile, if users can change the profiles, or maybe something the CA does to help out, like it did with UTF? That’s the sort of ting we’re trying to understand, and the best thing a CA can do is comprehensively describe things.

Flags: needinfo?(sunny_bxl)

(In reply to Ryan Sleevi from comment #11)

Thank you for acknowledging the issue. However, as a response, it does not really demonstrate why the issue occurred, and thus does not really provide assurance that similar issues won’t occur.

Understanding how something like this occurred is extremely important to understanding the steps being taken. At best, the current response feels like “We made a mistake, we fixed it, and we won’t make mistakes again”.

A useful explanation is to talk about how the system was designed, what controls you had then to ensure compliance, what teams were involved in reviewing the code and implementation, and what’s being done. Explaining more about what CA software you use, how it is designed and maintained, and how you approach compliance with 5280.

This isn’t about assigning blame or attempting to shame. It’s about trying to understand the full picture of how this happened, so we can have confidence it won’t happen again, and so we can help ensure other CAs do not make the same or similar mistakes. We want to learn from this issue, and we want to be confident that there’s a clear understanding about how this happened.

Given that RFC 5280 and the Baseline Requirements spell out the technical requirements very clearly, understanding how this was missed - not just that it was missed - is key. The only explanation we have to date is that folks thought UTF would be good. However, if the level of controls in the CA software are that fluid, and if trying to improve things for local users that important, how can we be assured that, say, sometime in the future someone might be handed a sub-CA cert? Maybe due to someone misconfiguring a profile, if users can change the profiles, or maybe something the CA does to help out, like it did with UTF? That’s the sort of ting we’re trying to understand, and the best thing a CA can do is comprehensively describe things.

Ryan:
Our certificate issuing system mainly includes RA and CA. RA for accepting applications and conducting preliminary review, mainly examining whether the information format is correct, such as whether the domain included in IANA’s Root Zone Database. The CA for issuing the certificate and verifying the format of issuing the certificate, such as encoding. All of this are following the requirements of RFC 5280 and Baseline Requirements.

The system was developed by our R&D department. Functional requirements will be reviewed by technical department, test department, operation department, maintenance department and compliance department. However, in our previous review, we may paid more attention on the functions of product, this cause some omission in the processing of some requirements, we need pay more attention on the review of code and rules

In the subsequent product update, we will strengthen the review of the code, conduct regular training of RFC 5280 and Baseline Requirements for new colleagues, so that more colleagues can understand the rules clearly, avoid similar things happen again

Thanks for the advice, which will greatly help us to improve ourselves.

Flags: needinfo?(sunny_bxl)

Wayne: I'm personally not confident in the proposed next steps, but I'm hoping you can take a look at this and share your thoughts. I don't really see anything that's going to improve systemically, and train more people, harder, doesn't particularly seem like a solution that will work out.

Flags: needinfo?(jonathansshn) → needinfo?(wthayer)

Oliver: in comment #8, you state:

We have started the construction of automatic audit and issuance system, to enhance the efficiency and fault tolerance mechanism of the system.

Will you please explain what this new system will do, and explain when it will be ready?

Flags: needinfo?(wthayer) → needinfo?(sunny_bxl)
You need to log in before you can comment on or make changes to this bug.