Closed Bug 1709070 Opened 3 years ago Closed 3 years ago

Taiwan-CA: Invalid stateOrProvinceName

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hcli, Assigned: hcli)

Details

(Whiteboard: [ca-compliance] [ov-misissuance])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36

Actual results:

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in the MDSP mailing list, a Bugzilla bug, or internal self-audit), and the time and date.

On 2021/5/3 16:44, we received an email reporting a certificate with invalid stateOrProvinceName.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

(All times are in UTC+8)
2019/7/2 08:33 The certificate is issued
2021/5/3 16:44 We received the reporting email.
2021/5/3 16:57 We have confirmed the certificate is actually mis-issued and start the investigation.
2021/5/3 17:22 The certificate is revoked.
2021/5/3 19:27 We have completed the investigation and determined the scope and cause of the problem.

  1. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

We have a list for possible values for ST in our SOP. We have asked RA operators to follow the SOP carefully when checking the subject attributes and reiterated that country name must not appear in ST, before we deploy the mitigation (section 7).

  1. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

We have checked all valid certificates and found 1 certificate with ST=<country name>

  1. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

https://crt.sh/?id=1640686563

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We rely on RA operators to check the validity of subject ST, because we accept many forms of the same name, for example, Tokyo, Tokyo-to, Tokyo Metropolis are all valid forms of Tokyo. This is a human mistake not identifying the problematic ST value.

  1. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

To mitigate the risk of this problem, we are planning to migrate to use an allow list for ST value.
I will post an update when the timeline is determined.

Assignee: bwilson → hcli
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance]

To mitigate the risk of this problem, we planned to integrate the list of acceptable stateOrProvinceName values into our RA system to make sure the value is accurate without only relying on validation performed by human. This enhancement is scheduled to be deployed no later than the end of August 2021.

Thanks for your quick incident report.

I have a few questions. Firstly does TWCA subscribe to this component on Bugzilla or do they regularly review bugs posted here? This has been a consistent issue across multiple CAs with bugs like bug 1645686 being reported almost a year ago.

If TWCA saw these bugs being filed, did they scan their own certificates against the acceptable ST list for any possible issues? If not, why not? If so, why did it not catch this issue?

To me, it doesn't make sense why TWCA would wait for an incident affecting their own CA to start planning to implement any type of allowlist solution, especially when they already have an allowlist for acceptable ST values.

Flags: needinfo?(hcli)

Thanks for raising this, George.

I agree with your concerns. When we look at this from a systemic risk perspective, this is something that we would have expected CAs to be aware of, by monitoring incidents and examining their own systems. That this isn't happening is concerning, and trying to figure out why that's happening, individually at CAs, is I think key to understanding how the ecosystem can improve.

So as part of the incident response, I think figuring out whether or not TWCA was monitoring issues, and what the root causes of that are, should be viewed as part of the scope for this incident.

Thank you for the questions.

We do regularly review bugs posted here and scanned our certificates for possible issues. Since last year, we have enhanced our system to block invalid characters, default values of frequently used CSR generation tools, N/A values, etc.. But we overlooked the ST=<country name> case because we mainly issue certificates for entities in Taiwan. We have manually checked all valid certificates where C != TW this time and confirmed that only the one reported contains invalid subject attributes.

Regarding the list mentioned in the answer 3, it only list official names and the RA operator will determine whether the value in application data is a form of the "official name". Like I mentioned in the answer 6, we accept any valid form of the same name with or without the subdivision category, in English or Chinese characters, spaced or hyphened when the name is pinyin, etc. The reason was to reserve flexibility as long as the values are valid, so it was decided not feasible to use the list as strict restrictions. The enhancement planned this time is to alert RA operators when the value is not on the list and require extra review and approval before issuance.

Flags: needinfo?(hcli)

Thanks for the reply.

What is the reason for this flexibility in the ST field? I am not well acquainted with provinces in Japan (where this particiular certificate was issued for) but would it not be feasible to have an allowlist based on the English version, and then another language version for the particiular country the certificate is being issued to?

Can you also provide timelines to when you created this flexible list, and then informed RAs to adhere to them? As well as when you started this plan for the alert system? Thanks.

Flags: needinfo?(hcli)

Hi George,

The registration flow was as follows: Customers submit their CSR which contains the desired subject attributes. Our system checks the CSR and fills the attributes into application data. The application data is verified manually for organization validation and will be corrected if necessary. The requirement was that, as long as the value submitted by customer is valid and correct, it will be used. I agree that it would be feasible to create a list that has the name in the form chosen solely by CA, but then it won't let customer choose the form of the same name they want and we decided we need this flexibility at that time.

We have further discussed the issue and made some adjustments to the plan. The subject attribute values will no longer be populated from CSR and normally will be selected by RA operators from the allowlist. When a value which is not on the list is requested, it will require review and approval from the compliance team to continue the issuance. Before we complete the system changes, all certificate requests with C != TW will require review and approval from the compliance team.

The updated timeline: (All times are in UTC+8)
2019/7/2 08:33 The certificate with invalid stateOrProvinceName is issued
2020/8/26 We created the list of valid subject attributes and informed RA operators, in response to observed incidents of multiple CAs and updates of requirements.
2021/5/3 16:44 We received the reporting email.
2021/5/3 16:57 We have confirmed the certificate is actually mis-issued and start the investigation.
2021/5/3 17:22 The certificate is revoked.
2021/5/3 19:27 We have completed the investigation and determined the scope and cause of the problem. Also we have decided to implement the alert function.
2021/5/5 We have discussed the issues raised here and made the adjustments described above.
2021/5/6 All certificate requests with C != TW will require review and approval from the compliance team.
Before 2021/8/31 The system changes described above will be deployed.

Flags: needinfo?(hcli)

Are there any updates to provide regarding your system changes? For example, have you implemented lists of allowed values for certain geographic location fields?

Flags: needinfo?(hcli)

The system change (list of allowed values) is now in the deployment process, which is currently scheduled to be completed before 7/31.
I will update here once it is completed.

Flags: needinfo?(hcli)

which is currently scheduled to be completed before 7/31

Can you please provide more meaningful detail about how this date was selected? I understand that since 2021-05-06 (per Comment #6), you've required all C != TW certificates to undergo review, but it's important to understand why it takes two months to implement CA software changes. Understanding this process, and how the timeline is derived, helps the community better understand how the CA approaches their system design and compliance, while also provides an opportunity for other CAs to learn about challenges and risks to being able to quickly implement corrective fixes.

Flags: needinfo?(hcli)

(In reply to Hao-Chun Li from comment #8)

I will update here once it is completed.

Also, please see https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/BOwcbWbZTg0/m/27F-UhedBgAJ

The expectation, as stated on https://wiki.mozilla.org/CA/Responding_To_An_Incident#Keeping_Us_Informed, is weekly updates unless and until a Mozilla representative agrees to the delay (e.g. Ben sets a Next-Update). Given that we're ~5 weeks away, providing weekly updates may be useful, depending on the answer to Comment #9.

This is why it's important to explain why something may take time: it helps better inform whether it's appropriate to set Next-Update, or if the plan involves risks that may warrant a more regular update schedule.

(In reply to Ryan Sleevi from comment #9)

which is currently scheduled to be completed before 7/31
Can you please provide more meaningful detail about how this date was selected? I understand that since 2021-05-06 (per Comment #6), you've required all C != TW certificates to undergo review, but it's important to understand why it takes two months to implement CA software changes. Understanding this process, and how the timeline is derived, helps the community better understand how the CA approaches their system design and compliance, while also provides an opportunity for other CAs to learn about challenges and risks to being able to quickly implement corrective fixes.

We made the system change (list of allowed values) part of the 2nd quarterly update of our RA system, which included other features and changes, and was scheduled to be deployed before 8/31 as I mentioned in comment #6.

We do have emergency deployment process for hotfixes that would get into production in much less time. But we have evaluated the situation and decided that this change should go through normal process because it is kind of and supportive feature to prevent human errors, rather than a critical security bug that needs immediate fix, and also we already had extra mitigations for the problem in place.

The deployment process for regular updates includes multiple test stages conducted by QA team and Operation Department, which would take generally 1-2 months, and the development freeze of Q2 update was scheduled at the end of June and hence the initial 8/31 deadline.
When I made the comment #9, I have checked the progress and it is now undergoing test by Operation Department and should progress into production by 7/31.

Flags: needinfo?(hcli)

Thanks. Ensuring details like those in Comment #11 are present in future incident reports.

Setting N-I for Ben to see about a NextUpdate of 2021-07-31

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] → [ca-compliance] Next update 2021-07-31

The system changes described in Comment#6 has been deployed to production system.

Flags: needinfo?(bwilson)

Dear Hao-Chun,
Does your Comment #13 mean that you have fully accomplished the remediation steps to prevent this incident from re-occurring?
Thanks,
Ben

Flags: needinfo?(hcli)
Whiteboard: [ca-compliance] Next update 2021-07-31 → [ca-compliance] Next update 2021-08-15

(In reply to Ben Wilson from comment #14)

Dear Hao-Chun,
Does your Comment #13 mean that you have fully accomplished the remediation steps to prevent this incident from re-occurring?
Thanks,
Ben

Yes, we have fully accomplished the remediation steps.

Flags: needinfo?(hcli)

I intend to close this on Friday, 13-Aug-2021.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] Next update 2021-08-15 → [ca-compliance] [ov-misissuance]
You need to log in before you can comment on or make changes to this bug.