GoDaddy: Intermittent unauthorized OCSP response when certificate is freshly issued
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: amir, Assigned: star)
Details
(Whiteboard: [ca-compliance] [ocsp-failure])
Through https://sslmate.com/labs/ocsp_watch/ I've noticed that for some period of time, OCSP status is unavailable for freshly minted certificates on GoDaddy.
I had reached out to GoDaddy through a CPR process with specific certs that have had issues, and the OCSP watch link but I've not received an answer indicating that they're doing a deeper look of why OCSP watch keeps flagging GoDaddy intermittently.
E.g: https://crt.sh/?id=13553082022&opt=ocsp This certificate is returning "ocsp.ParseResponseForCert => ocsp: error from server: unauthorized" at the time of opening this bug - I suspect that the problem will go away after a couple of minutes.
As far as I understand, OCSP responses are supposed to be available as soon as the precertificate is issued.
I can confirm that OCSP Watch is flagging GoDaddy and can see similar errors on crt.sh for recent certificates.
Thank you for assigning this to me. Just wanted to provide an update that we are reviewing this internally.
Updated•3 months ago
|
Comment 3•3 months ago
|
||
When I check these, they might appear in OCSP Watch, but then I immediately go to crt.sh, and they are reported as "good". What kind of latency are we talking about here? How soon after creation of a precertificate do these OCSP responses need to be available?
Of some relevance, in the CA/B Forum, I had mentioned adopting service level requirements for OCSP uptime, but that was dropped or hasn't been brought up again. See https://lists.cabforum.org/pipermail/servercert-wg/2020-May/001905.html.
An issue was filed in GitHub (https://github.com/mozilla/pkipolicy/issues/214), but that did not go into the latency encountered when new OCSP responses are distributed via a CDN, and we have not explored whether measuring that time starts from publication of the precertificate (although we do require that OCSP responses be available for precertificates). But no metric has been suggested for measuring this. Because of the complexity of setting such timeframes and the lack of feedback, I removed it from the version 2.9 changes to the MRSP.
Comment 4•3 months ago
|
||
As of 2024-07-02 19:57:06 UTC, this certificate had been deployed onto the TLS server and the OCSP responder was still responding unknown for it. So it's not just an issue with precertificates.
Assignee | ||
Comment 5•3 months ago
|
||
Thank you again for bringing this to our attention. Wanted to provide a preliminary response to some of the items raised here. What you’re seeing is latency in our current OCSP batching process. We believe this latency can be reduced and are taking actions to streamline and update this from a batch process to an on demand OCSP signing solution, targeting late Q3 2024/early Q4 2024. In the interim, we are continuing to look at other avenues to drive these numbers down (i.e. so the response being served is not “unauthorized”).
We would also like to clarify what the “unauthorized” response means for our certificates. When there is a delay with the current OCSP batch process, our system fails as “closed” to align with RFC 5019. We have our response fail in this way to align with the RFC and to err on the side of caution to prevent returning a “good” response when we should not.
Consistent with Ben’s observations in his comment above, we do not see a metric that we could use to baseline our OCSP responses that are propagated across CDNs. We would be happy reignite some of the discussion that Ben mentioned above and drive a ballot proposal within the CA/B forum to make it clearer within the TLS BRs to better drive expectation and accountability in this space – especially if it minimizes confusion and concern from relying parties.
Comment 6•3 months ago
|
||
I'm wondering whether a 15-minute latency for publication of OCSP responses for pre-certificates would be something that should be adopted either in the Mozilla Root Store Policy or by the CA/B Forum. I filed an issue in GitHub for this: https://github.com/mozilla/pkipolicy/issues/280.
Comment 7•3 months ago
|
||
(In reply to Ben Wilson from comment #6)
I'm wondering whether a 15-minute latency for publication of OCSP responses for pre-certificates would be something that should be adopted either in the Mozilla Root Store Policy or by the CA/B Forum. I filed an issue in GitHub for this: https://github.com/mozilla/pkipolicy/issues/280.
As noted above, this issue appears to exist for final certificates, not just pre-certificates, so such a grace period would not help in this particular case.
It's slightly unclear to me whether the Root Programs want to treat this issue as an incident. The BRs do not contain a general prohibition against serving "unknown" OCSP responses for actually-issued certificates. RFC 6960 says that the "unknown" state indicates that "the responder doesn't know about the certificate being requested", which appears to be an accurate description of the latency inherent in GoDaddy's OCSP infrastructure.
The Mozilla Root Program requires that "CA operators MUST maintain an online 24x7 repository mechanism whereby application software can automatically check online the current status of all unexpired certificates issued by the CA" (emphasis added). These certificates are clearly issued and unexpired, and yet their status cannot be checked at the OCSP responder. However, I assume that GoDaddy also produces CRLs in compliance with BRs 4.9.7, and likely satisfies this requirement via that mechanism instead.
Regardless of whether this is a full-blown incident or not, it seems clear to me that this chronic OCSP latency is both: a) unique to GoDaddy; and b) unexpected and undesired. It feels like a violation of the spirit of the requirements, if not their letter.
As such, as a community member, I would like to request a full report matching the CCADB incident report template, detailing how GoDaddy's system design led to this issue and what steps (including changes to the BRs!) are being taken to remediate it.
Once such a report has been provided for the benefit of the community, I think it would likely be appropriate to close this issue as INVALID.
Comment 8•3 months ago
|
||
Thank you for the request, Aaron. (I'm responded on behalf of Star Simmons who is on PTO this week. I am her Manager at GoDaddy).
We have been looking into the issue and agree we could be better. While we may be technically fulfilling the requirements, we are working to address these issues.
As we’ve investigated this closely, the best solution isn’t something that will be fixed overnight. So, our team is currently working on a short-term patch aimed at incremental improvements. We also have a better solution that we’re starting work on, but it’s longer term.
Regarding your request for a report, we plan on issuing a fully detailed report once we have finalized our project plans for the new long-term solution. We will do our best to fit this issue into the CCADB incident form, but we would like to note it may not be a perfect fit as this issue is not a formal violation.
Assignee | ||
Comment 9•1 month ago
|
||
Summary
GoDaddy’s OCSP response sync mechanism experienced a degradation in performance as the CA scaled up issuance. This led to the intermittent “unauthorized” responses from GoDaddy’s OCSP responders for newly issued certificates until the response was propagated to the public responder nodes.
Impact
It is not possible to identify the number of impacted certificates as the issue was intermittent and there is no history we can look back at to say which certificates had a delay in OCSP response propagation.
Timeline
All times are in UTC
2024-06-27 03:20:00 - CPR to GoDaddy sent by Security Researcher regarding example certificate giving “Unauthorized” OCSP response
2024-06-27 11:54:00 - GoDaddy acknowledges the CPR and starts investigation
2024-06-27 19:14:00 - GoDaddy responds to CPR with findings
2024-06-28 19:25:00 - Bug Report 1905419 filed
2024-06-28 23:04:00 - GoDaddy acknowledges bug report
2024-07-04 01:05:00 - GoDaddy makes initial response regarding latency
2024-07-18 03:19:00 - GoDaddy commits filing an incident report once short-term improvements are deployed
2024-07-19 00:10:00 - Responder syncing schedules tuned for more consistent propagation
2024-08-16 19:39:00 - Added additional script to “fast-track” propagation of newly generated responses to Responders
Root Cause Analysis
Background
GoDaddy currently uses a two-tiered OCSP Response generation system, where responses are pre-generated on a cluster, and then shipped to responder nodes that serve them up to clients when requested. As GoDaddy certificate issuance recently scaled higher, the OCSP response propagation time to OCSP responders increased. This delay caused our responders to return a 401 Unauthorized
response for newly issued certificates upwards of an hour after issuance while the response was being shipped to the appropriate nodes.
Missed Requirement
GoDaddy does not believe this issue constitutes a violation of CAB BRs, RFC 6960, or RFC 5019, GoDaddy has made and is continuing to make improvements to our OCSP system to scale.
Requirement Fix
Improvements to the OCSP responder sync schedule for more consistent syncing
Fast track syncing of newly issued certificates OCSP responses
Deployment
See timeline above.
Lessons Learned
Our OCSP response generation and synching mechanism needed an upgrade. The increased scale of certificate issuance caused a tipping point for properly responding to OCSP requests for newly issued certificates.
We also learned that further monitoring of our OCSP response generation and synching was required. We proactively added monitoring to alert if OCSP response synching is falling behind.
What went well
The improvements made to our OCSP response propagation have vastly improved our ability to properly respond to OCSP requests for newly issued certificates.
What didn't go well
Scale of OCSP response propagation did not match the increased scale of certificate issuance.
Where we got lucky
N/A
Action Items
| Action Item | Kind | Due Date |
| ----------- | ---- | -------- |
| Adjust response syncing schedules for more consistent propagation to Responders| Prevent | 2024-07-19 (completed) |
| Add additional script to “fast-track” propagation of newly generated responses to Responders | Prevent | 2024-08-16 (completed) |
Description
•