Closed Bug 1662807 Opened 4 years ago Closed 4 years ago

GoDaddy: Certificates issued with validity periods greater than 398-days

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ryan.sleevi, Assigned: jfox)

Details

(Whiteboard: [ca-compliance] [ov-misissuance] [dv-misissuance])

Attachments

(2 files)

CA/Browser Forum Ballot SC31 modified the Baseline Requirements to require that certificate validity periods MUST NOT be greater than 398 days and SHOULD NOT be greater than 397 days. The effective date for this requirement was 2020-09-01 00:00:00 UTC.

GoDaddy has issued a number of certificates that violate this requirement. For example, https://crt.sh/?q=28295c7ee49eae3bd51b3e6e51cb24178493190740d1356c663d743078f34357 or https://crt.sh/?q=aed2d43c2a0d8e1291ef2124062970c408cfe47e14ac62ac868d1197856682d6

Please provide an incident report, as per https://wiki.mozilla.org/CA/Responding_To_An_Incident

Flags: needinfo?(jfox)
Attached file List of affected certs
1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date. 9/1/2020 5:10:47 PM AZ Certificate problem reported via our problem reporting mechanism. 2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done. 9/1/2020 5:10:47 PM AZ Certificate problem reported 9/1/2020 8:42 PM Development team alerted and started research 9/2/2020 9:00 AM Reviewed the issue 9/2/2020 12:45 PM Reissued certificates 9/2/2020 4:00 PM Revoked affected certificates 3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation. Yes, we stopped issuing 825 day certs at 1:00 AM UTC 9/1/2020. 4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued. 107 certificates were issued between Midnight UTC and when the 398 day certificate lifetime change went into effect. 5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. 6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now. It was difficult to locate in the document where it stated that all effective dates are UTC time. The change was made well ahead of midnight Arizona time. 7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things. We are scheduling future implementations a day ahead.
1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date. 9/1/2020 5:10:47 PM AZ Certificate problem reported via our problem reporting mechanism. 2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done. 9/1/2020 5:10:47 PM AZ Certificate problem reported 9/1/2020 8:42 PM Development team alerted and started research 9/2/2020 9:00 AM Reviewed the issue 9/2/2020 12:45 PM Reissued certificates 9/2/2020 4:00 PM Revoked affected certificates 3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation. Yes, we stopped issuing 825 day certs at 1:00 AM UTC 9/1/2020. 4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued. 107 certificates were issued between Midnight UTC and when the 398 day certificate lifetime change went into effect. 5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. 6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now. It was difficult to locate in the document where it stated that all effective dates are UTC time. The change was made well ahead of midnight Arizona time. 7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things. We are scheduling future implementations a day ahead.

It was difficult to locate in the document where it stated that all effective dates are UTC time. The change was made well ahead of midnight Arizona time.

This is not really a root cause analysis. GoDaddy had over a month to carefully review these requirements, in addition to the significant amount of discussion period beforehand. Ballot SC31 further called out a number of specific changes for CAs to be aware of. Further, the change in the policy directly mirrored Apple and Google’s stated policies, both of which also unambiguously stated 00:00:00 UTC. If GoDaddy was at all aware of any industry developments, then it is inconceivable to believe this explanation.

The proposed mitigation, to schedule things one day sooner, demonstrates a concerning lack of awareness as to the basic requirements for being a CA. For example, it’s easy to imagine that GoDaddy, in its current form, argue that it failed to be aware of some other change because “it was hard to find”.

GoDaddy was, as best I can tell, the only CA in Mozilla’s program to make this mistake. If that is not causing GoDaddy to recognize how serious and egregious a failure this is, I’m not sure what will. A minimally acceptable response would detail all of the ways GoDaddy stays abreast of industry and root program requirements, the processes to review those changes, and how they’re being addressed going forward. Anything less would be a failure to recognize the seriousness of the incident, and the goals and objectives of https://wiki.mozilla.org/CA/Responding_To_An_Incident

GoDaddy folks:

It’s been a week here with no response. My hope is that the delays here are because y’all are preparing an exemplary incident report that will raise the bar, for all CAs, as to how to carefully examine root causes and develop holistic mitigations. However likely or unlikely that may be, I want to make sure we’re getting timely updates here, for something so significant and concerning that it is profoundly troubling that it happened.

Ben: Not sure how Mozilla intends to handle this, but CC’ing Apple folks since it’s a violation of their requirements, in addition to Google’s requirements, above and beyond the BRs. This is pretty egregious and concerning to miss a change like this, as it suggests GoDaddy does not have processes and procedures in place to effectively participate as a publicly trusted CA and stay abreast of industry changes. Given that SC31 itself had a redline of every change, as well as a discussion provided of each change, I’m not really sure the explanation provided is remotely reasonable. The lack of timely responsiveness is further concerning as to GoDaddy’s commitments.

Flags: needinfo?(bwilson)

Apologies for not responding sooner, we recognize the seriousness of this issue. We are continuing to, and have been, meeting to discuss this incident and uncover the true root of the problem. Our goal is to correct not just this incident, but prevent incidents going forward. There have been a number of organizational changes including ones that affect responsibilities for staying current and aware of industry regulations. This has been evident by an increased presence of GoDaddy members during various CA/B forum meetings. We will share more as the plan develops and provide updates here in a week.

Flags: needinfo?(jfox)

(In reply to Joanna from comment #6)

This has been evident by an increased presence of GoDaddy members during various CA/B forum meetings.

If anything, this is strong evidence that your organizational changes do not work at all. You have voted "Yes" on the very SC31, which included clarifications regarding the applicable timezone. Apparently without reading it?

Your changed CPS (page 10, version 4.7, https://web.archive.org/web/20200914205248/https://certs.godaddy.com/repository/certificate_practices/en/StarfieldCertificatePolicyandCertificationPracticeStatement.pdf ), raises further questions about your change process. The "Document History" for "4.7" references "Updated section 6.2.3 to reflect 398 day maximum validity period", while it should apparently section 6.3.2. There seems to be no review process on your side or it is not working at all.

Apart from that (but to complete the picture of a mismanaged CA), https://certs.godaddy.com/repository is offering outdated versions of your CPS depending on the user's "Content-Location" header (e.g., https://web.archive.org/web/20200914205618/https://certs.godaddy.com/repository/certificate_practices/fr/StarfieldCertificatePolicyandCertificationPracticeStatement.pdf , version 3.12).

Thank you for bringing this to our attention, it has already been addressed.

(In reply to Ryan Sleevi from comment #5)

Ben: Not sure how Mozilla intends to handle this, but CC’ing Apple folks since it’s a violation of their requirements, in addition to Google’s requirements, above and beyond the BRs. This is pretty egregious and concerning to miss a change like this, as it suggests GoDaddy does not have processes and procedures in place to effectively participate as a publicly trusted CA and stay abreast of industry changes. Given that SC31 itself had a redline of every change, as well as a discussion provided of each change, I’m not really sure the explanation provided is remotely reasonable. The lack of timely responsiveness is further concerning as to GoDaddy’s commitments.

Thanks Ryan! We are keeping a close eye on the incident response here and more generally within the CA Certificate Compliance component.

18 SEPT Update: This post is to inform you of changes we are making as we continue to discuss this critical issue internally. We identified our ballot review process was previously managed by a single expert. With the organization changes, we found that this expertise is now spread across a few teams. As such, we implemented a more formal ballot tracking process. This includes a link to the ballot, bulleted list of changes, documents to update, the implementation date/ time, and ballot status. The list is updated as ballots are proposed, discussed with the governance group, and evaluated in detail by appropriate teams.

We will continue to post updates as they become available.​​

24 SEPT Update: We are continuing to evolve and improve our compliance programs. As the next step in this journey, we established a project team to perform an evaluation of our governance functions. The team currently includes expertise from engineering and technology risk, and will add additional viewpoints across the organization. The group is authorized to make changes to address identified gaps. We are finalizing team composition and will begin project planning next week.

20 OCT Update: Planning activities are underway. This week consisted of gathering data to assist in scoping the project, such as incident reports, governance information, policies, and procedures. We will finalize the plan next week, then determine an appropriate reporting cadence for further posts.

Thanks for the update. It's not clear the project planning mentioned in Comment #11 and what's being updated in Comment #12.

It sounds like the extent of the project is "review how we approach compliance", but it's not clear why that would take two weeks (so far), and what the scope of that activity is.

More specifically:

  • Is this project evaluating how you approach compliance (e.g. in line with Sectigo's recent restructuring of compliance, to ensure that you're meeting the bare minimum expected of CAs, such as following all CA incidents)
  • Or is this project evaluating how compliant you are presently (e.g. whether there are other changes missed or overlooked, or other opportunities for misissuance)?

Comment #7 raised a reasonable concern about the section references, and I think it's beneficial to specifically understand how that happened. Whether you're reviewing procedures or actual compliance, it does seem useful to provide a holistic, thorough, candid description of the approach used today, and would explain both how this incident, and the issue in Comment #7, happened.

Flags: needinfo?(lwebb)

I appreciate your concern. The project will encompass both our approach to compliance as well as our state of compliance. However, we are approaching this as a problem solving project instead of an audit. This requires taking additional time to formulate the problem statement so we can derive an appropriate solution. This requires careful scoping and taking the time to carve out areas that are operating well so we can focus our attention where it is needed.

It may feel as if the project is moving slowly in the beginning. Once the problem is known, finding the solution will be relatively quick. We appreciate your patience as we work through this. Please let us know if there are further questions we can answer while we work.

Flags: needinfo?(lwebb)

This requires taking additional time to formulate the problem statement so we can derive an appropriate solution. This requires careful scoping and taking the time to carve out areas that are operating well so we can focus our attention where it is needed.

Understandably, I think greater transparency and candid communication than has been provided so far is critical to reassuring the community that GoDaddy is approaching this in a positive way that results in tangible improvements. As it stands, Comment #11 to now provides very little useful details, for CAs for for the community, in that it neither helps others replicate or review.

I think this is in need of immediate attention to explain how time has been spent.

Flags: needinfo?(jfox)

The additional time taken over the past few weeks include:

  • Conducting more thorough reviews of ballots with effective dates in the near future to ensure those activities do not result in incidents,
  • Completing work for the Agency Disclosure,
  • Updating CP/CPS documents to align with new rules going into effect, and
  • Landing projects in flight to free up the project team to focus on this initiative.

There are also conversations at the senior management levels to clear the path for the project team to be successful and determine if additional resourcing is needed.

For full transparency, additional ballots and changes that occur during this effort will result in this effort being further extended. These efforts impact our project resources and are prioritized higher to prevent further issues.

16 OCT Update: We sourced the problem to multiple organizational shifts resulting from growth that impacted the process to identify and implement compliance changes. Previously, ballots and incidents ran through one person who distributed changes to impacted teams and monitored to ensure changes were made correctly. This responsibility is now distributed through multiple teams.

The goal of this project is to create a compliance change process that results in 'verified good' changes, and confirm compliance in areas where gaps previously existed. The project will occur in three parts.

  1. Meet with the various teams responsible to implement compliance changes to map out current state and identify gaps.
  2. Leveraging work already done (as referenced in Comment #16), define processes, tools and methods such that changes end with a 'verified good' result.
  3. Where gaps previously existed, perform a state of compliance review to address any lingering issues.

We completed information gathering for step 1 and will finalize the gaps next week. Unless there are objections, we will continue reporting every two weeks.

30 OCT Update: Based on the steps in Comment #17:

  1. Complete. The outcome of this exercise is a documented current-state diagram that highlighted additional missing stakeholders, overlapping work, and missing hand-offs.
  2. In Progress. The team created a new incidents process with a DACI matrix, defined intake, decision frameworks linked to specific requirements (leveraging the timing matrix), response, and after action review. Next is to finalize this process with our stakeholders and draft the process for handling ballots, root store changes, and business changes.
  3. Defined. The team identified follow-on deep-dive projects. Scope and approach will be finalized after item 2.

In addition to the items above, we published an internal newsletter outlining the root store changes from last week's CA/B teleconference and inventoried the changes to ensure they are made correctly and timely.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] → [ca-compliance] Next Update 1-December-2020
Whiteboard: [ca-compliance] Next Update 1-December-2020 → [ca-compliance] Next Update 2020-12-01

13 NOV Update: Based on the steps in Comment #17:

  1. Complete. Refer to Comment #18
  2. Complete. The new process for handling ballots, root store changes, and business changes is in place.
  3. In Progress. The team is updating our control mapping information by associating CA/B documents to WebTrust requirements and cross mapping with our internal and external policies. Concurrently, our development team is tagging policy sections within program code. Controls and tags will be verified by the project team to ensure compliance.

How soon until #3 above will be completed?

Flags: needinfo?(jfox) → needinfo?(lwebb)

The project team is continuing to make progress. Item #3 should be complete by 08 JAN 2021.

Flags: needinfo?(lwebb)
Whiteboard: [ca-compliance] Next Update 2020-12-01 → [ca-compliance] Next Update 2021-01-11

08 JAN Update: The project team completed the control mapping exercise and are nearing completion with control verifications. To date we have not identified any additional gaps beyond the original governance processes put in place in comment 19. We expect to complete control verifications next week and will provide a final update at that time.

15 JAN Final Update: Control verifications are complete and no further gaps were identified. The new governance processes are in place. We are actively reviewing coming and proposed changes using these processes and they appear to be working well.

I'll close this on Friday, 26-Feb-2021.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] Next Update 2021-01-11 → [ca-compliance] [ov-misissuance] [dv-misissuance]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: