Closed Bug 1902868 Opened 1 year ago Closed 1 year ago

GoDaddy: CPR was not responded to in 24 hours

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rdaurne77, Assigned: jreading)

Details

(Whiteboard: [ca-compliance] [policy-failure])

Steps to reproduce:

Please note that GoDaddy/Starfield were emailed at 20:48 UTC advising of multiple mis-issued certificates due to low entropy in the serial number. Email was sent to practices@starfieldtech.com as per both CCADB records and their own CPS.

No reply has been received within 24 hours. Below is the email for transparency:

Hello,

How certain are GoDaddy/Starfield that their serial numbers are being generated with at least 64 bits of entropy from a CSPRNG? This is Ballot 164 effective 2016-07-08. There was a big incident in 2019 over it.

https://search.censys.io/certificates/74f3066c8e2e84cb874fe4b8a23cc1a518b9163d2c394fc2e22fd50225d122f7
https://search.censys.io/certificates/c59895222403c90914b6f1a5ec7bb85a5e46ffcac19920cd68eb570b9a4e71db
https://search.censys.io/certificates/a70ac8569c5b99cc872da29610f14587339e39015563de881b74f3c71da428f2
https://search.censys.io/certificates/1ff57e8e5691e26e0c2a2bbb3f6833e8a812409100bc50410643a265a037b0c1

You seem to be the only CA I can see with serial numbers this short, and presumably it's an older system still in active use.

- Wayne

To note there are certificates issued with 12 hex bytes in the serial number (~122) and 14 hex bytes accounting for at least 27k certificates. Given there are 4 bits of entropy in a hex character this accounts for 48 and 56 bits of entropy.

Example 48b cert: https://search.censys.io/certificates/1483b295aeb59fd92fe3ee673db5923e34f1974b3ed3a10ae3e2793166ee13e1
Example 56b cert: https://search.censys.io/certificates/834600cd257a2a61f9a17da785635d858391616eed0dd89393aa0737ef7586fb

There are additional issues, however this incident is specifically only to cover the lack of response within 24 hours. Note that this lack of response has occurred as recently as 2021-10-08.

wouldn't DER encoding strip leading 00 in serial number unless without it number will be negative, because of DER's shortest encoding only rule?
because 7M / 256 = 27300, I think it's expected range from doing bare minimum 64bit:

There does seem to be an issue with censys and crt.sh's handling of searching of serial number by hex. This is why a different CA has not raised an incident over this - a true false positive where the ASN.1 length of 8 was being misrepresented in the length of its hex.

However, that doesn't reflect the underlying incident due to be raised for GoDaddy. If you check the ASN.1 of the following records you can see the length of the serial number in the ASN.1 encoding itself:
6: https://search.censys.io/certificates/eed45d74e4563b8808b522107ad46f5f8e0075bbe405eec2e270e00133b17595

02 06 74 C9 28 40 06 95

7: https://search.censys.io/certificates/507d4e8ff03ef14e1a08c7bf90bc8626c4215ae5f478657a1422de04afff7d21

02 07 21 AB FC 75 08 60 10

And as an example:
9: https://search.censys.io/certificates/36d1511214a8a9040a2f9207c6bdaf6f77c13735f5e8050c704e4cc7c6748155

02 09 00 B2 C5 3B 95 9E CE AC 72

I will leave the root cause of all of this to GoDaddy for their other incident, this is mainly to handle their lack of response. To clarify, there has still been no response to date. The other CA in question was emailed at the same minute and we have since talked through this issue in detail privately. They were using ASN.1 serial length values of 8 that were being misread by multiple tools.

from RFC5280 serial number is an integer, not byte array (ASN.1 simple type 2) so it will have shortest length byte holder it fits. guide from Let's encrypt if serial from your example 6 was encoded with 8 bytes length (02 08 00 00 74 c9 28 40 06 95), it would violation of DER encoding for integer type.

To clarify, the underlying issue is a re-occurance of their 2019 incident: https://bugzilla.mozilla.org/show_bug.cgi?id=1533774
During that they clarified their interpretation: https://groups.google.com/g/mozilla.dev.security.policy/c/S2KNbJSJ-hs/m/2UIea4fyBgAJ

It is an accurate statement to say that GoDaddy generates 64 full bits of entropy prior to the DER encoding. When these 64 bits are DER encoded, the result is either 8 or 9 octets written into the cert, depending on whether or not the most significant bit is a 0 (8 octets) or 1 (9 octets). In the case of 9 octets being written, the first octet is always “00” signifying the integer value is positive. It is worth noting: whether that extra “00” octet is present or not, there are always 64 randomly generated bits providing the needed entropy.

RS - The reduction from >1.8M certificates to 12K certificates is a statement that only those 12K certificates lacked a 64-bit entropy contribution?
DR – Yes, the 12k certs are only 7bytes or less and therefor do not meet the BRs.

RS - possibly 273K certificates which GoDaddy does not consider issued, but otherwise made commitments to issue (such as logging a pre-cert)?
DR - Yes, in most cases we logged a pre-cert prior to final issuance and turnover to the requested. We want to start revoking these certificates as they should be disposed of if not fully issued.

I hope that sufficiently explains the situation so far.

when you run rand(0, 2^64) to get 64bit entropy it means it will return something less than 2^56 once in 2^8 times, encoding of such serial number would be shorter than 8 bytes, but they can't abandon those result as invalid, because if they do that it'd log2(2^64-2^56) (=63.996)bits of entropy, much subtle but same problem what caused 63 bit entropy, generate 64 random number but re-roll if highest bit is 1

Absolutely, the next CAs length-wise are using 10 octets at a minimum. If you check discussion on this topic in 2019 you can see that a lot of arguments were had over the precise entropy value in a certificate.

I do have a quick spreadsheet thrown together of the length of the serial number in hex as shown in censys. This is of course not quite correct given what I mentioned in comment #2 but it is a rough summary of the corpus currently. This is a topic for more discussion that should happen on MDSP, feel free to start the topic there if you wish.

This incident is solely about the lack of response, I hope that the upcoming incident will not devolve into discussion of the exact entropy calculation as it never turns out well for anybody.

I agree that leak of response is real separate problem

Assignee: nobody → brittany
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [policy-failure]
Flags: needinfo?(brittany)

Thank you for assigning this to me. Just wanted to provide an update that we are reviewing this internally.

Flags: needinfo?(brittany)

Incident Report

Summary

Due to human error, GoDaddy did not respond to a certificate problem report within 24 hours of receipt.

Impact

On 6/14/2024, GoDaddy received a certificate problem report from (subject of this Bug) and did not respond with a preliminary report on findings within 24 hours. GoDaddy’s lack of timely response did not meet the Baseline Requirements for the Issuance and Management of Publicly -Trusted TLS Server Certificates, section 4.9.5, which states “within 24 hours after receiving a Certificate Problem Report, the CA SHALL investigate the facts and circumstances related to a Certificate Problem Report and provide a preliminary report on its findings to both the Subscriber and the entity who filed the Certificate Problem Report.”

Timeline

All times are UTC.

  • 2024-06-14 20:49 - Problem report email received from rdaurne77@gmail.com

  • 2024-06-15 02:21 – Registration Authority (RA) Administrator notates the email/ticket as an inquiry on serial number entropy and requests supervisor support.

  • 2024-06-15 20:49 – Bug 1902868 posted

  • 2024-06-18 00:05 – GoDaddy Responded to the original CPR email from rdaurne77@gmail.com

Root Cause Analysis

Initial intake review of certificate problem report did not suggest any type of security or compliance issue. Based on the findings, the administrator escalated to a supervisor who advised that no immediate response was needed.

Lessons Learned

  • Additional training is needed regarding CPR communication obligations.

What went well

  • The administrator recognized that the ticket needed additional review and contacted the available supervisor.

What didn't go well

  • Manual processes for supervisor escalation did not result in the correct course of action.

Where we got lucky

  • Not Applicable

Action Items

Action Item Kind Due Date
Conversation and re-enforcement with administrator and supervisor on existing procedures N/A 2025-06-17
Refresher training for the certificate vetting department including validation specialists, administrators, supervisors, and leadership on the certificate problem reporting process Prevent 2024-06-28

Appendix

Details of affected certificates

N/A – incident was related to certificate problem report mentioned above

GoDaddy, did you miss the point from Chrome about not identifying reporters? Can you please explain what purpose that serves in this report?

Assignee: brittany → jreading

(In reply to Johnny from comment #9)

Summary

Due to human error, GoDaddy did not respond to a certificate problem report within 24 hours of receipt.

Are GoDaddy under the impression that human error is an acceptable explanation?

Impact

On 6/14/2024, GoDaddy received a certificate problem report from (subject of this Bug) and did not respond with a preliminary report on findings within 24 hours. GoDaddy’s lack of timely response did not meet the Baseline Requirements for the Issuance and Management of Publicly -Trusted TLS Server Certificates, section 4.9.5, which states “within 24 hours after receiving a Certificate Problem Report, the CA SHALL investigate the facts and circumstances related to a Certificate Problem Report and provide a preliminary report on its findings to both the Subscriber and the entity who filed the Certificate Problem Report.”

For transparency I will note that the only email reply I have received so far contains:

Dear Wayne,

We have received your Certificate Problem Report regarding GoDaddy/Starfield serial numbers not having the necessary 64 bits of entropy.

This email serves as a preliminary response. We have escalated this report to our engineering teams for review. As soon as this investigation is complete, we will respond accordingly.

Please let us know if we can help you in any other way.

If there has been a preliminary report on GoDaddy's findings then it is not in my mailbox, nor indeed raised on here.

Root Cause Analysis

Initial intake review of certificate problem report did not suggest any type of security or compliance issue. Based on the findings, the administrator escalated to a supervisor who advised that no immediate response was needed.

I will further note for the record that the subject of the original email was "Certificate Problem Report".

I can't see the action item type of 'N/A' on CCADB's incident report template. Could GoDaddy explain where it is?

Now, were GoDaddy to have been reading bugzilla as they are required to they would already have seen this comment. Please explain where the misunderstanding after this reiteration arose.

Flags: needinfo?(jreading)
  • 2024-06-18 00:05 – GoDaddy Responded to the original CPR email from [redacted]

Apologies I forgot to include a time for when that email was received: 2024-06-17 23:06:52 UTC.

I don't know how GoDaddy sent an email back in time, but it does show initiative.

(In reply to amir from comment #10)

GoDaddy, did you miss the point from Chrome about not identifying reporters? Can you please explain what purpose that serves in this report?

Our interpretation of that comment is confidentiality for reporters would not apply if they self-identified on the thread. Since we are replying to the publicly submitted bug report, we included the email address in the timeline to show consistency in communication.

Flags: needinfo?(jreading)

(In reply to Wayne from comment #11)

(In reply to Johnny from comment #9)

Summary

Due to human error, GoDaddy did not respond to a certificate problem report within 24 hours of receipt.

Are GoDaddy under the impression that human error is an acceptable explanation?

The report was properly reviewed and escalated internally according to our internal policies and procedures. Unfortunately, an error of judgment was made regarding the need to respond to the initial report. The action plan is to prevent this from happening again by enhancing training for those working CPR reports, including management.

Impact

On 6/14/2024, GoDaddy received a certificate problem report from (subject of this Bug) and did not respond with a preliminary report on findings within 24 hours. GoDaddy’s lack of timely response did not meet the Baseline Requirements for the Issuance and Management of Publicly -Trusted TLS Server Certificates, section 4.9.5, which states “within 24 hours after receiving a Certificate Problem Report, the CA SHALL investigate the facts and circumstances related to a Certificate Problem Report and provide a preliminary report on its findings to both the Subscriber and the entity who filed the Certificate Problem Report.”

For transparency I will note that the only email reply I have received so far contains:

Dear Wayne,

We have received your Certificate Problem Report regarding GoDaddy/Starfield serial numbers not having the necessary 64 bits of entropy.

This email serves as a preliminary response. We have escalated this report to our engineering teams for review. As soon as this investigation is complete, we will respond accordingly.

Please let us know if we can help you in any other way.

If there has been a preliminary report on GoDaddy's findings then it is not in my mailbox, nor indeed raised on here.

Root Cause Analysis

Initial intake review of certificate problem report did not suggest any type of security or compliance issue. Based on the findings, the administrator escalated to a supervisor who advised that no immediate response was needed.

I will further note for the record that the subject of the original email was "Certificate Problem Report".

I can't see the action item type of 'N/A' on CCADB's incident report template. Could GoDaddy explain where it is?

This action item has been updated to “Prevent”.

Action Item Kind Due Date
Conversation and re-enforcement with administrator and supervisor on existing procedures Prevent 2025-06-17
Refresher training for the certificate vetting department including validation specialists, administrators, supervisors, and leadership on the certificate problem reporting process Prevent 2024-06-28

Now, were GoDaddy to have been reading bugzilla as they are required to they would already have seen this comment. Please explain where the misunderstanding after this reiteration arose.

Our interpretation of that comment is confidentiality for reporters would not apply if they self-identified on the thread. Since we are replying to the publicly submitted bug report, we included the email address in the timeline to show consistency in communication.

I would advise that GoDaddy reflect on their own interpretation after prior comments...

Nevertheless, we are still at the situation where no preliminary report has appeared in either my inbox, nor this thread for the underlying issue. To date the only communication from GoDaddy's side are solely within this incident besides the sole email that I have documented.

It is at this point that I will note a similar incident as documented from #1900654:

Community member comments in Comment 1 are similar in nature to incident reporting expectations reminders shared in response to https://bugzilla.mozilla.org/show_bug.cgi?id=1883416 (i.e., Comment 1). Consequently, this report falls short of expectations.

Opportunities for improvement:

  • The timeline is incomplete.
  • There are opportunities for formatting improvements (i.e., table Markdown is broken).
  • The reporting template was not followed (e.g., there’s no “What didn’t go well?” list).
  • Root cause is, again, interpreted as “human error", which is considered insufficient.
  • The remediation defaults to additional human processing and manual processes to prevent this issue from repeating. This is not considered a complete or systemic solution, especially considering over-reliance on humans was a contributing factor for this incident. For example, are there opportunities for linting, possibly to include use in a test environment, to prevent future profile non-conformance?

Also, can you help us understand why reminders shared in https://bugzilla.mozilla.org/show_bug.cgi?id=1883416 were not considered when responding to this incident as a demonstration of continuous improvement?

Particularly timeline, root cause, and remediation. I hope that GoDaddy acknowledges that their internal policy and procedures do not supersede CCADB policy on incident reporting. We are approaching 1 week from when the CPR was first sent and no update has since appeared.

Do we have even a rough estimate on when any potential investigations are likely to start? Noting that these are supposed to be done within 3 days of receipt of a CPR.

Flags: needinfo?(jreading)

Timeline and Action Item Updates

Timeline Updates

All times are UTC.

  • 2024-06-17 20:30 - Leadership discussed the original CPR and Bug 1902868. No violation was found during the initial review of the problem report. However, advanced teams started additional research.

  • 2024-06-20 18:30 - The additional research confirmed that we remain in compliance with section 7.1.2.7 of the BRs.

  • 2024-06-20 21:07 - Responded to the problem report with our findings

  • 2024-06-20 21:19 - An additional response was received from the reporter asking for further review

  • 2024-06-21 19:54 - Responded with additional details regarding our findings

Action Item Updates

Action Item Kind Due Date
Conversation and re-enforcement with administrator and supervisor on existing procedures Prevent 2024-06-17
  • Completed as of 2024-06-17
Action Item Kind Due Date
Refresher training for the certificate vetting department including validation specialists, administrators, supervisors, and leadership on the certificate problem reporting process Prevent 2024-06-28
  • Scheduled for 2024-06-26
Flags: needinfo?(jreading)

I can confirm that timeline, and in regards to this incident specifically how does the action list reflect previous statements in #1734953? There have been more incidents since then, but what actually has changed?

I will note that Comment 15 still applies and has not been responded to yet.

Flags: needinfo?(jreading)

Related to the following mitigations committed to in #1734953, I have added applicable notes inline related to this specific incident.

  • M1. System Update (Completed): Implement automated alerts to notify the RA team and management to provide additional visibility into CPR inbox messages.

This action was completed in 2021 and there is alerting in place that notifies the team when new emails are received. There are also daily posts alerting management that the CPR email queue has been cleared. In this case, the alerting showed that the CPR email queue had been cleared even though the team had dispositioned the CPR with an internal only note and had not provided a customer email response (i.e. failing in judgement to respond).

  • M2. Process Updates (Completed): Update monitoring schedule to ensure more coverage and redundancy. Additionally, within team documentation, formalize a RACI chart to further clarify roles and responsibilities.

This action was completed in 2021. Coverage and redundancy including escalation capabilities are in place including during weekend hours and there is schedule which documents who is responsible for monitoring the CPR queue. In this case the CPR was reviewed and escalated to a supervisor within the 24-hour timeframe. As previously noted, the cause of the lack of response was an error in human judgement.

  • M3. People Training (Completed): Train/coach RA to reinforce importance of Certificate Problem Reports, and all associated timeframes.

Targeted training was completed in 2021. It is correct to say that we are having to do a targeted reinforcement again because of this issue (and the root cause being an error in judgement).

The referenced incident from 2021 was the result of CPR’s not being reviewed and therefore not responded to within 24 hours. As a result, the additional alerts and process updates described above were implemented. The alerts and process updates implemented in 2021 ensured that the current problem report was reviewed, escalated and commented on internally well within 24 hours. Due to an error in judgement no response was sent at the time of initial review.

We consider the posted timeline, root cause and action items sufficient to address this incident. We also recognize an area of improvement regarding communication on the initial reply to the CPR. Additional statements should have been included regarding our preliminary review indicating that while no violation was found it was being sent to our Engineering team for further review. Training to clarify communication of the preliminary report while in parallel escalating it for further review as needed was included in the documented training delivered on 2024-06-26.

Godaddy always has and continues to strive to align with CCADB policy on incident reporting.

We updated our timeline in comment 16 to better answer this question and show when our investigation was completed as well as when a response was provided to the reporter as shown in the following:

  • 2024-06-20 18:30 - The additional research confirmed that we remain in compliance with section 7.1.2.7 of the BRs.

  • 2024-06-20 21:07 - Responded to the problem report with our findings

We could have been clearer with this timeline, however this is an indication that our investigation is complete, and we have communicated the final findings back to the problem reporter.

Action Items Update

Action Item Kind Due Date
Regular refresher training for the certificate vetting department including validation specialists, administrators, supervisors, and leadership on the certificate problem reporting process Prevent 2024-06-26
  • Completed as of 2024-06-26
Flags: needinfo?(jreading)

No further action is pending. GoDaddy continues to monitor this issue for any further comments or questions.

Unless there are additional questions or issues to discuss, can we close this sometime later this week?
Thanks,
Ben

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.