Open Bug 1463975 Opened Last year Updated 20 days ago

GRCA: Misissued certificates: Invalid commonName, commonName not in SAN

Categories

(NSS :: CA Certificate Compliance, task)

task
Not set

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: ryan.sleevi, Assigned: gpki, NeedInfo)

Details

(Whiteboard: [ca-compliance])

Attachments

(3 files)

24.11 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document
Details
23.13 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Details
44.67 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Details
Example certs:
https://crt.sh/?id=454582309&opt=zlint,x509lint,cablint
https://crt.sh/?id=454680387&opt=zlint,x509lint,cablint

These certificates improperly concatenate multiple SANs into the commonName, using semicolons.
Whiteboard: [ca-compliance]
Please reply in this bug very promptly to acknowledge that you have been informed of this bug, and provide a timeline for resolving the concern.

Then please provide an incident report in this bug, as described here:
https://wiki.mozilla.org/CA/Responding_To_A_Misissuance#Incident_Report
Flags: needinfo?(gpki)
Assignee: wthayer → gpki
Emailed GRCA POCs requesting an immediate response to this bug.

We found this issue and stopped the multi-domain certificate service at May 7th, 2018. And the issue was fixed at May 10th, 2018. Number of affected certificates during Jan. 7th to May 7th is 88. These affected certificates will be revoked by Feb. 28, 2019.
I will provide the Misissuance Incident Report in few days.

Flags: needinfo?(gpki)

Thanks for acknowledging.

There are a few things concerning, even in that brief response:

  • https://wiki.mozilla.org/CA/Responding_To_An_Incident calls out the need for prompt incident reports. I understand this issue is 8 months old, but GRCA was contacted nearly a week ago. I'd be concerned that it will still take a few days for the incident report.
  • The incident report will need to clarify what factors you considered in your selection of a revocation date so far beyond what the BRs permit, consistent with the aforementioned page.
  • The incident report will need to explain why the delta between 2018-05-07 and when the incident report is filed. That is, we'll want to understand not just how this issue happened (with multi-domain validation), but why the process for responding to or reporting incidents failed in this way.
Flags: needinfo?(gpki)

Provide the Incident Report

Flags: needinfo?(gpki)

Provide the affected certificates lists

Comment on attachment 9035570 [details]
Incident Report[bug1463975].docx

Pasting the details from this, for folks who have trouble opening the .docx. It appears to have been translated, and I've made slight changes to the format to make it easier to read in Bugzilla.

  • How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

GCA stars to issue a multi-domain for test server at Jan. 7th,2018 and starts to issue multi-domain certificates at Mar. 23rd, 2018. These certificates contains multiple domains in CN field. Internal staff found this issue at May 7th and fixed this issue at May 10th. The CN field contains only one domain name afterward.

The 88 affected certificates issued between Jan. 7th and May 7th are informed to apply for new certificates. Due to these certificates are used by hundreds of government agencies and their web servers, the apply for new certificates process would take longer than we expect. The subscribers asked for additional time for their process to apply for new certificates and replace the old ones before we revoke these certificates. After the evaluation of the security risk and subscriber's process time, we decided to revoke all affected certificate by Feb. 28, 2019.  By Dec. 31 2018, 20.5% of affected certificate are re-issued.

GCA aware this issue by our technicians and fixed the problem at May 10th, but our contact window(Mr. Hung-Yu, Hsu) don’t receive the notice about this bug[1463975] until Mr. Wayne Thayer mail to him at Jan 5th ,2019. So, we missed the report time. May be there is something wrong with our mail(gpki@ndc.gov.tw).

  • A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Jan. 7th 2018: Issue a multi-domain for test server
Mar. 23rd 2018: Start multi-domain certificate service
May 7th 2018: Found this issue and stop multi-domain certificate service
May 10th 2018: Fix this issue and continue multi-domain certificate service
Feb. 28th, 2019: Revoke all certificate containing this issue.

  • Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

The multi-domain certificate service stopped at May 7th, 2018 and the issue was fixed at May 10th, 2018. Number of affected certificates during Jan. 7th to May 7th is 88. These affected certificates will be revoked by Feb. 28, 2019.

  • A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

Number of affected certificates during Jan. 7th to May 7th 2018 is 88.

  • The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

All affected certificates are listed in attachment.

  • Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

The CA starts to issue multi-domain certificates from Jan. 7th 2018, the technicians miss to check the format of multi-domain certificate.

  • List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Step.1 Found the issue at May 7th 2018
Step.2 Stopped multi-domain certificate service at May 7th 2018
Step.3 Fix the issue at May 10th 2018
Step.4 Revoke all affected certificates by Feb. 28 2019

Thank you for providing this incident report. I have the following questions:

  • Please provide a root cause analysis for this problem. The answer "the technicians miss to check the format of multi-domain certificate" in not sufficient. For example: Why did the technicians miss the problem? Why was it not caught in testing?
  • Please add a description of what steps GRCA will take to prevent a similar issue from occurring in the future to your answer to question #7. For example, will more testing be performed? Will pre- or post-issuance linting be implemented?
  • Mozilla expects both the misissuance and the failure to revoke within the time required by the BRs to be qualifications that appear on the next GRCA BR audit statement. Have you informed your auditor of both of these issues?
  • Please explain what you will do to ensure that future incident emails from Bugzilla will be received and promptly responded to.
Flags: needinfo?(gpki)

To add to Wayne's comments:

These certificates contains multiple domains in CN field. Internal staff found this issue at May 7th and fixed this issue at May 10th.

This sounds like you failed to file an incident report, and the incident report that was filed was after they were detected in the wild. Please consider the failure to report as an incident, provide a root cause analysis of this incident, and provide a description of what steps GRCA will take to prevent such non-reporting in the future.

The 88 affected certificates issued between Jan. 7th and May 7th are informed to apply for new certificates. Due to these certificates are used by hundreds of government agencies and their web servers, the apply for new certificates process would take longer than we expect.
By Dec. 31 2018, 20.5% of affected certificate are re-issued.

This is deeply concerning.

  1. Please provide details about the 20.5% affected certificates being reissued. Specifically, please provide details about the old certificate and the replacement certificate.
  2. Given that only 20.5% were replaced by 2018-12-31, please indicate what GRCA "expects" the time to be to replace certificates. That is, it would appear, based on this response and current progression, that not replacing for 8 months was "expected", but not replacing for 12 months is "not expected". Clarity about what constitutes "expected" timelines would be useful, as well as any factors that contribute to that expectation.
  3. There's a seeming disconnect between 88 certificates being used by hundreds of government agencies. Can you explain how a single certificate can be used by multiple government agencies and their web servers?

(In reply to Wayne Thayer [:wayne] from comment #8)

Thank you for providing this incident report. I have the following questions:

  • Please provide a root cause analysis for this problem. The answer "the technicians miss to check the format of multi-domain certificate" in not sufficient. For example: Why did the technicians miss the problem? Why was it not caught in testing?
    According to BR’s update about the CN field are depreciated, the technicians think that the CN field will no longer effective. For the convenience of certificate query, technicians put all FQDN in the CN field separated by semicolon. The technicians tested with browsers, but not tested with automated certificate check tools. We will improve our flow to add automated certificate check process before new type of certificate are issued.
  • Please add a description of what steps GRCA will take to prevent a similar issue from occurring in the future to your answer to question #7. For example, will more testing be performed? Will pre- or post-issuance linting be implemented?
    The new type of certificate will be check by cablint/x509lint/zlint. We will change our flow of the certificate system update process and add the check in the process.
  • Mozilla expects both the misissuance and the failure to revoke within the time required by the BRs to be qualifications that appear on the next GRCA BR audit statement. Have you informed your auditor of both of these issues?
    This audit period end at April 2018, and this incident happen at May 2018. We have informed our auditor of these issue.
  • Please explain what you will do to ensure that future incident emails from Bugzilla will be received and promptly responded to.
    We will build an automatic email forwarding mechanism to forward mail from gpki@ndc.gov.tw to contact window(Mr. Hung-Yu, Hsu). Or maybe the Bugzilla can notify all the contact windows listed in the CCADB
Flags: needinfo?(gpki)

(In reply to Ryan Sleevi from comment #9)

To add to Wayne's comments:

The 88 affected certificates issued between Jan. 7th and May 7th are informed to apply for new certificates. Due to these certificates are used by hundreds of government agencies and their web servers, the apply for new certificates process would take longer than we expect.
By Dec. 31 2018, 20.5% of affected certificate are re-issued.

This is deeply concerning.

  1. Please provide details about the 20.5% affected certificates being reissued. Specifically, please provide details about the old certificate and the replacement certificate.

We requested the subscribers to apply a new certificate with new key pair, thus the FQDN in certificate may differ. Some subscribers doesn’t really use the certificates and maybe will not apply for new certificate (25 certificates are not installed on web servers by now). The mapping of old certificate/new certificate are listed in attachment. We will update this list continuously.

  1. Given that only 20.5% were replaced by 2018-12-31, please indicate what GRCA "expects" the time to be to replace certificates. That is, it would appear, based on this response and current progression, that not replacing for 8 months was "expected", but not replacing for 12 months is "not expected". Clarity about what constitutes "expected" timelines would be useful, as well as any factors that contribute to that expectation.

We made the time to replace certificates based on the response of government agencies. Most of government agencies requested 6-12 months to replace the certificate. In our experience, the certificates are requested for replacement when the deadline closing. GCA will revoke these certificates by February 28, 2019.

  1. There's a seeming disconnect between 88 certificates being used by hundreds of government agencies. Can you explain how a single certificate can be used by multiple government agencies and their web servers?

We have government agencies’ (belong to the same ministry) web servers hosted together in a ministry’s data center. The ministry built a web application firewall to handle the TLS decryption. The ministry applied and managed multi-domain certificate for these agencies.

Status: NEW → ASSIGNED
QA Contact: kwilson → wthayer
Summary: [GRCA] Misissued certificates: Invalid commonName, commonName not in SAN → GRCA: Misissued certificates: Invalid commonName, commonName not in SAN

All affected certificates are revoked on March 4.
We originally scheduled to revoke these certificates on February 28. However, 2/28~3/3 are holidays in Taiwan, so we do this job on the next working day.

Wayne: I'm not sure if you want additional information. I'm concerned that the issue here was a matter of understanding the applicability of the rules, and the proposed mitigation - linting - only partially addresses that. However, I'm not sure if you have suggestions for better mitigations or next steps?

Flags: needinfo?(wthayer)

The new type of certificate will be check by cablint/x509lint/zlint. We will change our flow of the certificate system update process and add the check in the process.

Has pre-issuance linting been implemented? Also, have any other changes been implemented to prevent this or similar types of operators errors in the future?

Flags: needinfo?(wthayer) → needinfo?(gpki)

Emailed POCs on 2019-07-08 regarding this issue, highlighting https://wiki.mozilla.org/CA/Responding_To_An_Incident#Keeping_Us_Informed

(In reply to Wayne Thayer [:wayne] from comment #15)

The new type of certificate will be check by cablint/x509lint/zlint. We will change our flow of the certificate system update process and add the check in the process.

Has pre-issuance linting been implemented? Also, have any other changes been implemented to prevent this or similar types of operators errors in the future?

After multiple evaluations, We implemented another checking mechanism to prevent this or similar types of operators errors.
The domain name in the CN field is generated by system instead of by manual. System will choose first domain name with length less than 64 bytes to put in CN field. If all domain names' length exceed 64 bytes, the enrollment is rejected.

Flags: needinfo?(gpki)
Flags: needinfo?(wthayer)

(In reply to National Development Council from comment #17)

(In reply to Wayne Thayer [:wayne] from comment #15)

The new type of certificate will be check by cablint/x509lint/zlint. We will change our flow of the certificate system update process and add the check in the process.

Has pre-issuance linting been implemented? Also, have any other changes been implemented to prevent this or similar types of operators errors in the future?

After multiple evaluations, We implemented another checking mechanism to prevent this or similar types of operators errors.
The domain name in the CN field is generated by system instead of by manual. System will choose first domain name with length less than 64 bytes to put in CN field. If all domain names' length exceed 64 bytes, the enrollment is rejected.

This sounds like a very specific change that prevents CN>64 bytes. How are you preventing "similar types of operator errors"?

Flags: needinfo?(wthayer) → needinfo?(gpki)
You need to log in before you can comment on or make changes to this bug.