Closed Bug 1410834 Opened 7 years ago Closed 7 years ago

Comodo: CAA Mis-Issuance on basic test case

Categories

(CA Program :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: quirin, Assigned: Robin.Alden)

Details

(Whiteboard: [ca-compliance] [dv-misissuance])

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 Steps to reproduce: SSL.com issued a certificate for a test domain that has a very simple CAA setup: issue ; issuewild ; Certificate: https://crt.sh/?id=235543115 Zone: http://dnsviz.net/d/gazebear.net/WefzKQ/dnssec/ I have reported this to sslabuse@comodo.com on Oct 23, 09:04 UTC. Kind regards Quirin
Hello, For this particular order, SSL.com is simply a reseller for Comodo. SSL.com did not issue nor perform validation on this certificate. The SubCA which issued this certificate does not chain to any SSL.com roots. Comodo has been notified and a revocation request for this certificate has been sent to them as well. Regards, Leo Grove
Assignee: kwilson → robin
Summary: SSL.com: CAA Mis-Issuance on basic test case → Comodo: CAA Mis-Issuance on basic test case
Whiteboard: [ca-compliance]
Flags: needinfo?(robin)
Hi, is there any update on this? Kind regards Quirin
Bug 1398545 has been tracking Comodo's response to this issue. I think we can probably close this one? Gerv
Hi, Bug 1398545 treated CAA mis-issuances by Comodo under the Comodo brand, which was fixed and retested long (weeks) before this incident. This is a new case, which could be rooted in differential treatment for certificates resold by SSL.com. IMHO, we could either bring the new insights from this bug to Bug 1398545 or treat it separately. Kind regards Quirin
Robin: comments? Gerv
Robin: ping? Gerv
Hi Gerv, I will have a reply for you by tomorrow, December 8. Regards Robin
Hi Quirin, Gerv, All Comodo-issued SSL Certificates get exactly the same CAA treatment, whether sold by us or resold by SSL.com or by anyone else. That has always been the case. INCIDENT REPORT PREAMBLE As a Certification Authority, there are a various checks we make before we issue an SSL certificate. 1) We check that the applicant has control of the domain. We call this Domain Control Validation, or DCV for short. 2) We look for a particular type of DNS record for the domain that allows the subscriber to specify who may issue certificates for their domain. This is the CAA record lookup. That first check, DCV, does not require that there is working DNS for the domain, or even for the registrable domain under which the FQDN falls. That sounds strange but is the case because one of the acceptable methods of DCV is that the applicant can receive an email from us at an email address specified in the WHOIS record for the registered domain name and that doesn’t require anyone to have setup DNS on the domain. Not only does it *not* require DNS that *resolves* for the domain name, but the DNS can be thoroughly misconfigured for the domain so that every query times out or gives an error and we can still go ahead. The CAA check, however, is more pernickety. Our treatment of errors in getting a CAA record for the domain is (to some degree) specified for us in the BRs. CAs are permitted to treat a [CAA] record lookup failure as permission to issue if: • the failure is outside the CA's infrastructure; • the lookup has been retried at least once; and • the domain's zone does not have a DNSSEC validation chain to the ICANN root. The challenge is that, according to that list, for some domains CAA now requires a decision about whether a DNS zone is correctly signed. If the zone is unsigned then we may issue a certificate even if all the DNS lookups timeout or indicate failure. HOW WE FIRST BECAME AWARE We have been working on implementing and re-implementing a reliable ‘is the zone signed’ check for a while and you might reasonably think that it was a solved problem, but DNS in general is good at answering the question ‘does that name resolve, and to what value?’ and poor at answering most other questions. Until 13-October-2017, our CAA checking implementation effectively assumed that "the domain's zone does not have a DNSSEC validation chain to the ICANN root". That was an invalid assumption in the general case and so our implementation had to change. On 13-October-2017, Quirin found this hole in our implementation and obtained a certificate https://crt.sh/?id=229495637 for a domain that did have a DNSSEC validation chain but whose DNS server was configured to timeout for CAA records. TIMELINE OF ACTIONS TAKEN While formulating a response to the report received on 13th October, it was indicated to us that at least one other CA (Let’s Encrypt) failed closed in this situation and we made a change so that when a CAA lookup times out or gets a SERVFAIL response, we failed closed (i.e. we did not issue the certificate). Failing closed for errors on DNSSEC lookups for CAA records when those errors were recurrent through multiple retries turned out to be a disaster from a customer support perspective. It meant we were blocking the issuance of thousands of certificates per day, and in almost all cases the DNS Zone was not, in fact, signed. From 17-October-2017 we allowed our support staff to manually override the DNSSEC errors for the CAA checks after they had used online tools to check that the DNS zone had no signature. This allowed some of the blocked certificate requests to be issued but this manual override process could not keep up with the demand. On 18-October-2017 we reverted to failing open. This triggered the issuance of another test certificate request previously submitted by Quirin, as detailed in comment#0. On 23-October-2017 we identified an edge case with DNSSEC content which was still causing some domains to be effectively un-issuable by us when the DNS query response was sufficiently long that it overflowed a logging field we had assigned for the purpose. Our support staff established work arounds for the affected customers which typically had the customer define an A record for the domain name they wanted in the certificate – even if they otherwise had no need for the domain name to resolve in DNS. Affected customers were therefore experiencing relatively long issuance times for certificates with domain names hitting this issue. On 3-November-17 we rolled out two changes to our CAA checking error processing to reduce the burden on our support staff: a) it correctly handles the long DNS responses so that support intervention and DNS changes are not needed; b) when faced by persistent error indications from our own recursive resolvers, it automatically decides whether the zone has a DNSSEC validation chain by performing an ANY lookup for the domain against a 3rd party external resolver and checking for the AD bit in the response. As of this date we again fail closed for CAA where no CAA record can be retrieved but there is an indication that the zone is signed. The issues with our CAA lookups which were causing unnecessary delays for customers with correctly configured DNSSEC chains getting their certificates were then resolved. ACTIONS OUTSTANDING When we have persistent error indications from a domain we are relying on responses from recursive resolvers to determine if the zone is signed. To mitigate the threat that an attacker might pose by DOSing resolvers to induce us to believe that a domain’s DNS response always errors when in fact the domain is signed we use resolvers from multiple providers. We feel we can further improve this resilience by implementing our own top-down DNSSEC validation by going back to querying the root and authoritative servers for the zones in the chain to make a better determination in those edge cases where we could otherwise struggle to accurately determine whether a zone is signed but unavailable for some reason or whether the zone is unsigned but broken. Restating why that matters: if it is signed but unavailable then we may not issue (because there might be a CAA record in there that we can’t fetch) but if it is unsigned and broken then we may issue. POSTAMBLE CAA checking that always fails closed on timeout/SERVFAIL/etc is relatively simple, but we've discovered that the implementation complexity skyrockets when you try to fail open in a compliant fashion. Specifically, it's no longer effective to use 3rd party recursive resolvers to do the DNS(SEC) heavy lifting. That in turn requires the development of new DNSSEC clients whose behaviour is specific to our task. This is a work still in progress.
Flags: needinfo?(robin)
Hi Robin, thank you for the incident report. This specific case was a basic test case, responding "issue ;" to all CAA queries. The zone was correctly signed at that time, and responded to queries [1] -- so it should not be related to the treatment of lookup failures on signed zones. > On 18-October-2017 we reverted to failing open. > This triggered the issuance of another test certificate request previously submitted by Quirin, as detailed in comment#0. Are you saying you received a lookup failure for that zone? As stated above, that zone was responsive. I will be happy to look at the pcaps of the authoritative name servers if you can give me a precise time stamp. I (and [1]) generally see responses to CAA queries on 2017-10-19. Kind regards Quirin [1] http://dnsviz.net/d/gazebear.net/WefzKQ/dnssec/
Flags: needinfo?(robin)
Robin: this bug has been NEEDINFO you for a month now; when might we see a response? Gerv
(In reply to Quirin Scheitle from comment #9) > thank you for the incident report. This specific case was a basic test case, > responding "issue ;" to all CAA queries. > The zone was correctly signed at that time, and responded to queries [1] -- > so it should not be related to the treatment of lookup failures on signed > zones. > > > On 18-October-2017 we reverted to failing open. > > This triggered the issuance of another test certificate request previously submitted by Quirin, as detailed in comment#0. > > Are you saying you received a lookup failure for that zone? As stated above, > that zone was responsive. I will be happy to look at the pcaps of the > authoritative name servers if you can give me a precise time stamp. I (and > [1]) generally see responses to CAA queries on 2017-10-19. > Our CAA lookup for gazebear.net for this certificate at 2017-10-18 21:54:57 UTC timed out. The log doesn't indicate whether this was an internal or an external problem, but since we were failing open at that time we issued the certificate. Regards Robin Alden Comodo CA
Flags: needinfo?(robin)
(In reply to Robin Alden from comment #11) > Our CAA lookup for gazebear.net for this certificate at 2017-10-18 21:54:57 > UTC timed out. The log doesn't indicate whether this was an internal or an > external problem, but since we were failing open at that time we issued the > certificate. I have received the DV e-mail at 2017-10-18 20:10 UTC and received the e-mail with the certificate at 2017-10-19 16:10 UTC. I looked at the pcap files from both authoritative name servers for 2017-10-18 UTC from 20:00 UTC through 23:00 UTC. The files show CAA lookups for gazebear.net from Comodo-associated IP addresses at 20:13 and 21:11 UTC. The show a total of 16 queries from 3 source IP addresses (2 IPv6, 1IPv4), directed to both of our NS [happy to share the pcap file privately it helps], all responded to by our NS. This might be the odd, unlikely but possible, case that none of the 16 replies we have sent were received. Anyway, given that Comodo has already taken several steps to address DNS-lookup-related issues, I assume there is nothing left to do for this ticket.
QA Contact: gerv → wthayer
Resolving per Quirin's comment.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [dv-misissuance]
You need to log in before you can comment on or make changes to this bug.