Closed Bug 1618256 Opened 2 years ago Closed 2 years ago

DigiCert: Failure to properly encode Subject name

Categories

(NSS :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ryan.sleevi, Assigned: brenda.bernal)

Details

(Whiteboard: [ca-compliance])

In https://bugzilla.mozilla.org/show_bug.cgi?id=1576013#c52 , DigiCert reported additional certificates they determined were non-conforming with the BRs. However, as they do not clearly share the same root cause of Bug 1576013, it's best to treat these as a separate issue.

Jeremy, Brenda: I opened this as a separate issue, for the set of certificates reported in Bug 1576013, as they seem to be a different root cause.

I've not filtered out the list of reported certificates, below, to figure out which are applicable to this specific issue, but I request that you do so as part of providing an incident report.

In particular, I'm concerned that this seems similar to issues like Bug 1353827, which should have been resolved three years ago. It would be good to ensure that that incident is included in the timeline, what steps were taken or were not taken (e.g. was this only focused on CN, rather than on other subject fields), and what steps have been taken to prevent future similar misissuances.

https://crt.sh/?id=74590291
https://crt.sh/?id=286806581
https://crt.sh/?id=70739892
https://crt.sh/?id=72857136
https://crt.sh/?id=287116546
https://crt.sh/?id=104477778
https://crt.sh/?id=133856953
https://crt.sh/?id=110994920
https://crt.sh/?id=117188269
https://crt.sh/?id=111451469
https://crt.sh/?id=95803736
https://crt.sh/?id=99467695
https://crt.sh/?id=120504633
https://crt.sh/?id=110994922
https://crt.sh/?id=109730800
https://crt.sh/?id=105227697
https://crt.sh/?id=113638875
https://crt.sh/?id=103935048
https://crt.sh/?id=113186529
https://crt.sh/?id=103935043
https://crt.sh/?id=98610448
https://crt.sh/?id=113186528
https://crt.sh/?id=280755018
https://crt.sh/?id=87895536
https://crt.sh/?id=282315194
https://crt.sh/?id=287250778
https://crt.sh/?id=94094035
https://crt.sh/?id=287387058
https://crt.sh/?id=86771271
https://crt.sh/?id=98621178
https://crt.sh/?id=287116514
https://crt.sh/?id=98610440
https://crt.sh/?id=287116194

Assignee: wthayer → brenda.bernal
Status: NEW → ASSIGNED
Flags: needinfo?(brenda.bernal)
Whiteboard: [ca-compliance]

Incident Report – Mozilla Policy Violation (Failure to properly encode Subject name)

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

DigiCert conducted an internal audit of our certificates for linter issues and identified ones with certificates that contained one of the following errors:

  1. organization names that exceed 64 characters – 31 certs
  2. non informational values in the OU field (with a "-") – 2 certs

2.A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

February 13, 2020 – Compliance analytics team ran linter analysis on existing active certificates
February 14, 2020 – Initial analysis results provided to Digicert Support team to investigate
February 17, 2020 – Confirmed list of certificates with errors that required revocation
February 21, 2020 – List of 33 active certificates revoked (5 days)
February 22, 2020 – I posted on https://bugzilla.mozilla.org/show_bug.cgi?id=1576013 the revocation action taken with a list of crt.sh links
February 26, 2020 – Ryan Sleevi informed us that he opened Bug 1618256, to file a separate incident report.

3.Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

On 05-April-2017, an issue with common name and organization name exceeding the max length was identified in the moz.dev.security forum; a resulting bug was filed (https://bugzilla.mozilla.org/show_bug.cgi?id=1353827). DigiCert resolved the issue by patching on 09-May-2017. At that point, DigiCert’s systems would block the error of exceeding max length on common name and organizational name.

4.A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

List of certificates are above: https://bugzilla.mozilla.org/show_bug.cgi?id=1618256#c1
First issuance: 13-December-2016
Last issuance: 03-April-2017

5.The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

See 4 above

6.Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

As indicated in 3) above, the systems were patched to block issues with max length exceeding 64 characters for common and organization name. However, the fix was made going forward to block new cert issuance. The scan was conducted on common names that exceeded max length to identify any other problematic ones at that time. The scan was not conducted on organization name length issues which is the subject of this incident, along with non-informational value in the OU field.

7.List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Our Compliance function has been focused in the last 4.5 months on finding certificate anomalies from our issuance that we need to correct. The methodology with our analysis is to find the error condition and conduct a comprehensive sweep over the entire population of active certificates. As part of our incident management approach, our goal is to continue to identify these types of issues, conclusively and comprehensively. We expect that this approach will address the gap of remediating all certs moving forward.

Flags: needinfo?(brenda.bernal)

Thanks for updating this, Brenda.

I think it may be useful to expand the timeline to provide a bit more context of the original event. You can edit the comment, or you can revise it by adding a new comment. In particular, you end up discussing events not covered on the timeline, while the timeline is meant to help provide that necessary context for the overall event.

Part of my concern with the report as provided is:

  • The dates conflict with those provided in Bug 1353827; in that bug, the patch was reportedly applied 2017-Mar-09; this states 2017-May-09.
  • The current report only discusses commonName and organization, while there are similarly other fields with limits. Did DigiCert address these other fields as well? While I raised this in Comment #1, I'm having trouble discerning this from the report.
  • In Bug 1353827, DigiCert reported scanning their systems on 2017-Mar-9, it's not clear why this scan either:
    a. Didn't examine other fields; or,
    b. Failed to detect these certificates

DigiCert has, in the past, provided rather detailed incident reports, in part recognizing the understandable concern that comes from the number of compliance issues being found or which have existing historically. I think this incident report leaves much to be desired in terms of addressing the concerns. While I understand that events three years ago can be difficult to investigate, I'm concerned that this report has seeming inconsistencies with past statements, which would be an avoidable issue, and doesn't seem to acknowledge where the gaps may have been in historic understanding and where/why/how they're being addressed now. For example, this seems an entirely missed opportunity to contextualize whether or how Bug 1526154 is relevant to this, or any other improvements.

I'm hoping you can revisit this report, and the overall approach, and make sure each of these incidents are treated as if they might lead to CA distrust, and where the primary mitigation for that is having comprehensively detailed reports that can stand on their own, are consistent with past statements, and perhaps build an overall picture by referencing past incidents and events when they contain relevant or useful information or context. Such reports help build a better understanding, for all CAs, and help discern good practices that can or should be formalized as baseline requirements.

Flags: needinfo?(brenda.bernal)

Digicert's response is inline below from Comment 3 above marked with >>.

Part of my concern with the report as provided is:
• The dates conflict with those provided in Bug 1353827; in that bug, the patch was reportedly applied 2017-Mar-09; this states 2017-May-09.
o Similarly, it's suggested the issue was reported 2017-April-05, yet the original report was 2017-Mar-08.

This is an error made in the “y” and the “r” on the patch applied date. Sorry for the confusion.
I made a mistake in grabbing the date from the original report (2017-April-05) on the Bugzilla by Kathleen and did not pick up the date from the thread in the original moz.dev.security post. I am restating the original issue report date to 2017-Mar-08.>>

• The current report only discusses commonName and organization, while there are similarly other fields with limits. Did DigiCert address these other fields as well? While I raised this in Comment #1, I'm having trouble discerning this from the report.

No, DigiCert has not looked at all the delimited fields, only the organization name. This is a good suggestion to expand our scope, thank you.>>

• In Bug 1353827, DigiCert reported scanning their systems on 2017-Mar-9, it's not clear why this scan either:
a. Didn't examine other fields; or,
b. Failed to detect these certificates

The scan that was completed in 2017-Mar-09 was only on the common name and not the organization name field. While we recognize that the limitation of scope did not catch these certificates then, we have instituted a comprehensive approach and process to do system wide sweeps across all active certs in scope of the problem analysis area. >>

DigiCert has, in the past, provided rather detailed incident reports, in part recognizing the understandable concern that comes from the number of compliance issues being found or which have existing historically. I think this incident report leaves much to be desired in terms of addressing the concerns. While I understand that events three years ago can be difficult to investigate, I'm concerned that this report has seeming inconsistencies with past statements, which would be an avoidable issue, and doesn't seem to acknowledge where the gaps may have been in historic understanding and where/why/how they're being addressed now. For example, this seems an entirely missed opportunity to contextualize whether or how Bug 1526154 is relevant to this, or any other improvements.
I'm hoping you can revisit this report, and the overall approach, and make sure each of these incidents are treated as if they might lead to CA distrust, and where the primary mitigation for that is having comprehensively detailed reports that can stand on their own, are consistent with past statements, and perhaps build an overall picture by referencing past incidents and events when they contain relevant or useful information or context. Such reports help build a better understanding, for all CAs, and help discern good practices that can or should be formalized as baseline requirements

Bug 1526154 does reference a missed revocation of underscore certificates. This was filed on 2019-02-07 in which we talk about making substantial improvements in our data reporting to ensure the thoroughness and accuracy when we are required to revoke certificates related to ballots and related incidents as part of our never-again plan.
The effort that concluded with the 33 certificates being revoked, which is the subject of this Bugzilla, was an internal effort with the Compliance team to scan for any anomalies in our cert issuance.
I mention in section 7) of https://bugzilla.mozilla.org/show_bug.cgi?id=1618256#c2 above that our team launched an effort 4.5 months ago to run analytics on our cert issuance population with the objective of finding these anomalies for correction, reporting them to the community and sharing our findings and remediation. We are running the scans not just on DigiCert certs but across the industry. We are scanning for all issues that would be caught by our linter now but maybe weren’t adequately scanned for prior to the linter deployment.
We intend to continue these efforts as a way to help improve practices across the CA industry. >>

Flags: needinfo?(brenda.bernal)

(In reply to Brenda Bernal from comment #4)

I made a mistake in grabbing the date from the original report (2017-April-05) on the Bugzilla by Kathleen and did not pick up the date from the thread in the original moz.dev.security post. I am restating the original issue report date to 2017-Mar-08.>>

Are these incident reports cross-checked? I can understand mistakes happen, but the incident report is the primary way that the CA has to assure the community they "take security seriously". I understand transcription errors happen, but restating the original issue to 2017-Mar-08 only opens up new questions, because it means you had a gap where misissuance continued from when the issue was reportedly patched, 2017-Mar-09, until 03-April-2017

I implore DigiCert to carefully evaluate the next response, and take this as an opportunity to regroup and reframe, perhaps even treating it as an opportunity to do the incident report over, applying the principles I'm trying to communicate here, Comment #3, and Comment #1. Making sure the incident report is consistent and that the timeline provides the full picture of any necessary or relevant details, is a great way to check to make sure that root causes are identified.

• In Bug 1353827, DigiCert reported scanning their systems on 2017-Mar-9, it's not clear why this scan either:
a. Didn't examine other fields; or,
b. Failed to detect these certificates

The scan that was completed in 2017-Mar-09 was only on the common name and not the organization name field. While we recognize that the limitation of scope did not catch these certificates then, we have instituted a comprehensive approach and process to do system wide sweeps across all active certs in scope of the problem analysis area. >>

To make sure I understand:

  • 2017-03-08, Bug 1353827 was opened about both commonName and organizationName violations
  • 2017-03-09, DigiCert patches both of these fields, as well as the other fields in RFC 5280
  • 2017-03-09, DigiCert scans their entire system, but only for commonName issues, despite the bug being about more
  • 2017-04-03, DigiCert's last misissued cert with an overlong name
  • 2019-02-07, DigiCert reports that it will improve its scripting in the future to ensure certificates are not overlooked in the scope of an investigation

You can see where having a timeline that is comprehensive starts to reveal curious questions and gaps

I appreciate that this issue is the result of more comprehensive testing, and while I'm not entirely sure I know why it takes 4.5 months to find an issue easily found with existing linters, I won't begrudge that something is better than nothing. I expected, going into this, that DigiCert would have been able to provide an exemplary incident report, that shows how this historic misissuance was a long-since remedied anomaly, and that the various root causes, such as the technical compliance failure, the failure in the original scan, the failure to detect until now, had all long since been remedied as part of DigiCert's overall compliance, with references to the past bugs that help build a picture about the many practices DigiCert has put in place since then. However, that hasn't really been the case, and now I'm worried that perhaps I was too optimistic in relying on DigiCert's assertions on systemic changes being made. That's the difference a 'good' incident report can make, and that's why I'm encouraging to revisit whether you'd like to (effectively) attempt a new incident report (as a new comment), rather than try to fill in the gaps.

Of course, you don't have to do so. I think we've got most of the information, so if you just want to address the gap from March to April, we can go that route. But it would be a shame for DigiCert to pass up an opportunity to help provide an illustrative incident report. I'd like to suggest y'all carefully evaluate what would be the most beneficial, for both DigiCert and the ecosystem, and perhaps review internally any draft posted next.

Flags: needinfo?(brenda.bernal)

We are working on pulling historical information to respond to your inquiries here. We will update once we have the information. Thank you.

Flags: needinfo?(brenda.bernal)
Flags: needinfo?(brenda.bernal)

Ryan, In response to https://bugzilla.mozilla.org/show_bug.cgi?id=1618256#c5 requesting for a revised and expansive timeline from the initial reporting of the issue to current remediation of remaining problematic certs, please see below:

Mar 8, 2017 - Reported by Ryan Sleevi on Mozilla dev policy, DigiCert began investigation, revoked the certificate, and scanned system for names that were too long.
Mar 9, 2017 – DigiCert (Jeremy) replied to Ryan about the root cause as follows (abbrev version):
certificate was issued by an employee of DigiCert as a test on our systems to see if we'd resolved an issue with a path permitting CN fields greater than 64 characters. The policy was not followed in this case.
- RA system was patched for all subject fields that have max length
- Scans were ran and found only one instance of CN too long, which was revoked
Back then, DigiCert used a “snapshot” system for validation that allowed certificates to issue if they were valid. The compliance checks were implemented on the RA at the time, which was effectively skipped by the snapshot system in issuance. DigiCert realized the checks were being skipped later on; therefore, improvements/patches were applied.
April 3, 2017 – DigiCert issues certificate with org name exceeding 64 characters
April 5, 2017 – Compliance checks on the snapshots were moved to the CA before issuance. Problematic certs were blocked/corrected before issuance.
Feb 7, 2019 – DigiCert reported that it will improve its scripting system in relation to underscore scanning. (referenced above in comment #5)
Feb 28, 2019 – DigiCert Development started making improvements with our scripting for future incidents.
Sept 16, 2019 – Our DigiCert Compliance team establishes it’s analytics function with the purpose of bolstering our internal capability to research certificate issues, ensuring a comprehensive scan and response.
The analytics team’s primary initial focus was to dive into the EV JOI issue since that was a key bug that DigiCert was working at that time. Originally, the team focused on validation source and associated registration number by source for internal usage. Once the team completed its JOI investigations, we were going to move on to other projects, including scanning historical bugs that may have been treated differently.
Feb 1, 2020 – Digicert Compliance Analytics team sorted through our project pipeline and started establishing non-JOI priorities. We then launched into a full scan of all certs that include pre-2019 issues.
Feb 13, 2020 – Compliance analytics team runs linter on all certs to see if we missed anything prior to the incident report system being more uniformly applied within both DigiCert and on Mozilla.
Feb 14, 2020 – Initial analysis results provided to Digicert Support team to investigate.
Feb 17, 2020 – Confirmed list of certificates with errors that required revocation.
Feb 21, 2020 – List of 33 active certificates revoked (5 days).
Feb 22, 2020 – DigiCert posted on https://bugzilla.mozilla.org/show_bug.cgi?id=1576013 the revocation action taken with a list of crt.sh links.
Feb 26, 2020 – Ryan Sleevi informed us that he opened Bug 1618256, to file a separate incident report.

Flags: needinfo?(brenda.bernal)

Thanks Brenda. The expanded timeline and detail helps build a better understanding here.

To make sure I'm accurately understanding:

  • DigiCert's original compliance patch (on 2017-03-09) only focused on new requests from new organizations (the RA platform)
  • This meant that organizations which had been previously validated, despite violating the requirements, were still allowed to issue, because they were in the "snapshot" system
  • On 2017-04-05, DigiCert development moved the validation from the RA into the CA, preventing new issuance

I think what's unclear to me is the following timing:

DigiCert realized the checks were being skipped later on; therefore, improvements/patches were applied.

It sounds like that was sometime after 2017-03-09, and before 2017-04-05, because it ultimately lead to the work being done on 2017-04-05. When that work was done, there also wasn't a retroactive analysis made for certificates in the 2017-03-09 to 2017-04-05 range; that didn't happen until the 2020-02-01/2020-02-13 full scan, right?

As this incident is nearly three years old, I'm going to set N-I for Wayne. I don't believe DigiCert was the only one to take this approach to compliance (that is, prevent new badness from getting in, without addressing existing badness), and it's clear that this approach has changed through February of this year.

Flags: needinfo?(wthayer)

In response to the following,

It sounds like that was sometime after 2017-03-09, and before 2017-04-05, because it ultimately lead to the work being done on 2017-04-05. When that work was done, there also wasn't a retroactive analysis made for certificates in the 2017-03-09 to 2017-04-05 range; that didn't happen until the 2020-02-01/2020-02-13 full scan, right?

The work to patch the check issue was initiated on 03-13-2017, and rolled out on 04-05-2017. That is correct that a full scan was not done until the work we completed this past February 2020.

It appears that all questions have been answered and remediation is complete.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(wthayer)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.