Closed Bug 1651026 Opened 1 year ago Closed 10 months ago

Izenpe: certificate issued to internal domain

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: o-garcia, Assigned: o-garcia)

Details

(Whiteboard: [ca-compliance] Next Update 2020-12-01)

Attachments

(1 file)

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

One of our internal detection systems warned us about the misissuance of one certificate

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Jul 6th 12:32: our internal detection system warned us about the issuance to an internal domain
Jul 6th 12:35: the affected certificate was revoked
Jul 6th: we investigated the root cause including the source of the request and the system that issued the problem certificate. We identified that the problem was that the information in the CSR was different to the information in the application form. Although we already have an web application in production that filters that CSRs, some of our customers still send us the application form by email, signing the form with a qualified representative certificate, and those certificates are validated manually.
Jul 6th: we ran a script over our existing certificate database to identify certs affected by this issue. No additional certificates were identified.

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

Jul 7th: we have required all our customers to use the web application, which authomatically builds a new subject from the received CSR with the information of the application form.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

https://crt.sh/?id=3048690947

  1. The complete certificate data for the problematic certificates.

https://crt.sh/?id=3048690947

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

Manually validation in some cases.

  1. List of steps CA is taking to resolve the situation and ensure it will not be repeated.

Require to use our web application in all cases

Oscar,

Thanks for this report. This is a major issue, and I'm not sure the incident report provides sufficient detail to indicate Izenpe understands it as such. This issue demonstrates Izenpe has systems capable of issuance without validating the domain name, and that's deeply concerning.

I'm hoping you can be more comprehensive in your explanation of how this happened, and more comprehensive in the steps you are taking holistically to ensure no unvalidated certificates are issued. In 2020, even a single failure of domain validation controls represents an egregious breach of trust, especially when there are meant to be layers of defense against this. For example, CAA checks could clearly not have been performed with this certificate, so it raises the question of whether CAA was checked at all for any manual certificates.

As part of your incident report, which will explain in full detail how manual validation was performed and what controls existed, as well as full detail about how you're ensuring that the only path for issuance forward is through a limited API (e.g. by providing a full architectural diagram/flowchart of your issuance process), I'd also like to ask you provide a full list of all unexpired certificates that have been manually validated.

Flags: needinfo?(o-garcia)
Assignee: bwilson → o-garcia
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

Oscar, I agree entirely with Ryan. Much more information is needed in the incident report - especially in sections 6 and 7.

Process review:

Until last July 7th we had an alternative path to request a TLS certificate. Our customers could send a request form signed with an eIDAS qualified certificate by the representative of the entity. It was a contingency way they could use in case they had a technical problem to process the request using the web application. We had a specific email contact for that purpose. The list of certificates that have been processed that way in last 2 years is the following:

www.barakaldo.eus https://crt.sh/?id=3052537874
www.zestoa.eus https://crt.sh/?id=2974182442
www.aulesti.eus https://crt.sh/?id=1665344033
www.astigarraga.eus https://crt.sh/?id=2964904131

The validation and issuance process in those cases is:

1.- The validation team receives the email. That email encloses the signed PDF with the request form and the CSR
2.- The validation team verifies all information in the request form from trusted sources, according to BRs/EVGs. Evidences for each validation are registered in a document, which is digitally signed by the validation operator. In all cases we verify CAA and high risk requests.
3.- In case the application is for a qualified certificate, we also generate a validation minutes document, which is digitally signed and sent to a different member of the validation team. This second operator reviews all validations made, and in case everything is correct, signs the document too, and it's sent to a third member of the validation team, which issues the certificate according to the application form, validation minutes document and the CSR.
4.- Our PKI software provides some preventive controls to avoid misissuances:

  • Public key (size, public exponent, modulus, etc.)
  • Subject and SAN: are according to RFC 5280, CABF BR and CABF EV. We have found that the last patch disabled this control, so in those cases where applications didn't come from the application this control was not applied.
    5.- Once the certificate is issued, we have detection controls developed by us to verify that the issued certificate is according to the BR/EVs. One of this control warned us of this misissued certificate

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now:

We have an internal root CA for TLS certificates with internal domains. The email account enabled to send the requests for this internal CA is the same as the email account in case of contingency with the web application. On July 6th we had some network problems, so this customer sent us two different request forms by email, one for the external CA and other one for the internal CA. We processed both requests, and both of them were OK. The problem was at the time of the issuance, that the operator took the CSR of the internal request to issue the TLS with external domain names. This wouldn't have happened with our web application, because it regenerates the CSR with the information of the application form, and verifies Subject and SANs.

List of steps CA is taking to resolve the situation and ensure it will not be repeated.

  • Avoid any alternative path to the web application -> done (July 7th)
  • Enable the validation of subject and SAN in our PKI software -> done today (July 10th)
Flags: needinfo?(o-garcia)

I suppose I’m still missing important details in order to understand how this system works.

  • Can you describe what 3.2.2.4 validation methods you use for such requests?
  • Can you describe how you verify CAA? You said “In all cases we verify CAA”, but I can’t see how that can be true, since CAA should have signaled that something was amiss. Did you get an NXDOMAIN and assume it was for the CAA record and not for the domain itself? Understanding how you validate helps understand if there were places this could have been caught.
  • Can you provide a timeline for “ We have found that the last patch disabled this control”? Understanding when it was disabled and how it was it wasn’t noticed until now seems like a useful and valuable exercise.
  • Is it correct to understand that “ Avoid any alternative path to the web application” means disabling the ability for validation agents to approve CSRs?

Having a split between a public and a private PKI is definitely a good design. Have you looked into why the operator was confused about which PKI to issue this certificate from? Is there any opportunity to improve the ability to distinguish between these? I realize this may be seen as moot, because you disabled manual issuance. However, it sounds like there’s still validation that may happen using different methods for public/private use cases, and making sure the validation agent clearly understands seems useful?

Flags: needinfo?(o-garcia)

(In reply to Ryan Sleevi from comment #4)

I suppose I’m still missing important details in order to understand how this system works.

  • Can you describe what 3.2.2.4 validation methods you use for such requests?

www.barakaldo.eus https://crt.sh/?id=3052537874

The applicant selected to use the “3.2.2.4.18 Agreed-Upon Change to Website v2” method

  1. One member or our validation team sent an email with a new random value in a text file to the contact in the request form.
  2. The applicant published that file to www.barakaldo.eus/.well-known/pki-validation/<file name>.
  3. Our validation team verified that the published file contents were the same as they had sent

www.zestoa.eus https://crt.sh/?id=2974182442

The applicant selected to use the “3.2.2.4.2 Email, Fax, SMS, or Postal Mail to Domain Contact” method

  1. One member of our validation team asked for whois contacts (registrant, technical and administrative) for zestoa.eus domain
  2. Our member team sent an email with a new random value to the registrant email contact
  3. Our validation team verified that the received email included the same random value as they had sent

www.aulesti.eus https://crt.sh/?id=1665344033

The applicant selected to use the “3.2.2.4.6 Agreed-Upon Change to Website” method

  1. One member or our validation team sent an email with a new random value in a text file to the contact in the request form.
  2. The applicant published that file to www.aulesti.eus/.well-known/pki-validation/<file name>.
  3. Our validation team verified that the published file contents were the same as they had sent

www.astigarraga.eus https://crt.sh/?id=2964904131

The applicant selected to use the “3.2.2.4.2 Email, Fax, SMS, or Postal Mail to Domain Contact” method

  1. One member of our validation team asked for whois contacts (registrant, technical and administrative) for astigarraga.eus domain
  2. Our member team sent an email with a new random value to the registrant email contact
  3. Our validation team verified that the received email included the same random value as they had sent
  • Can you describe how you verify CAA? You said “In all cases we verify CAA”, but I can’t see how that can be true, since CAA should have signaled that something was amiss. Did you get an NXDOMAIN and assume it was for the CAA record and not for the domain itself? Understanding how you validate helps understand if there were places this could have been caught.

CAA validation in these four cases was made manually. In case of this misissued certificate, the validation was made on www.barakaldo.eus, because that was the domain name in the request form. The problem was with the CSR, that was for the internal CA, exactly for www.barakaldo.euskalsarea

  • Can you provide a timeline for “ We have found that the last patch disabled this control”? Understanding when it was disabled and how it was it wasn’t noticed until now seems like a useful and valuable exercise.

Last February 11th that patch was applied. We didn’t realize that the check for internal domains the PKI software makes didn’t work. That check is also made into the web application, and in the scripts that verify all issued certificates (that’s how we realiced of this misissuance).
We enabled that check again last July 10th, when we saw it reviewing our systems as a consecuence of this incidence.

  • Is it correct to understand that “ Avoid any alternative path to the web application” means disabling the ability for validation agents to approve CSRs?

That’s correct, now that all requests come from the web application, validation agents don’t need to see or validate the CSRs, because they’re automatically generated by the app, with the information from the request form.

Having a split between a public and a private PKI is definitely a good design. Have you looked into why the operator was confused about which PKI to issue this certificate from? Is there any opportunity to improve the ability to distinguish between these? I realize this may be seen as moot, because you disabled manual issuance. However, it sounds like there’s still validation that may happen using different methods for public/private use cases, and making sure the validation agent clearly understands seems useful?

On July 10th we have created a different environment to manage TLS certificates for internal CA. This includes a specific email contact, repository, etc.

Flags: needinfo?(o-garcia)

Thanks for clarifying.

It's not clear to me: Why are CSRs necessary to generate the information to issue certificates? Could you describe more about your CA system architecture?

I think the added details in Comment #5 are helping build a better picture of what went wrong, and how these fixes address it, but I'm trying to make sure nothing is overlooked.

If I understand correctly, the system worked effectively as:

  • A request comes in (e.g. via a website)
  • This request goes into some sort of database
  • The validation agent validates this information, updating the database with the validated information
  • An internal system produces a CSR with that validated information
  • (That CSR is... delivered somehow? by hand? by e-mail? copying over to another system? I'm not sure how to parse, given that "validation agents don't need to see ... the CSRs")
  • The CA extracts the information from the CSR and produces the cert

That is, rather than the CA talking to the database with the validated information, the information is contained within the CSR. The CSR is presumably not signed (since the CA can't sign the CSR). I'd be concerned if the CSR from the Subscriber was used, but comments like "automatically generated by the app" make me think this is not the case.

Prior to this fix, validation agents could manually bring CSRs, from the Subscriber, to the CA software. This would have the effect as appearing indistinguishable from fully validated data. The issue here is that the CSR used was not validated, and so it was like injecting arbitrary API commands or "equivalent" (in effect) as if the validation database had been compromised.

The fix has been to reduce the API surface, so that validation agents can't directly bypass these APIs.

I think having a diagram of the validation flow, like other CAs have provided, is useful to making sure we're not overlooking other possible issues.

Flags: needinfo?(o-garcia)

We don't generate CSRs and we don't reuse them in any case, we always ask the customer to provide it. The web application rebuilds the provided CSR with the subject information from the validated data, and the public key from the CSR sent by the customer. We attach a diagram to try to explain the validation flow when the certificate https://crt.sh/?id=3048690947 was misissued. Now it's the same, but as previously indicated, since July 7th we only accept requests from the web application, so now the process would be just the right side of the flowchart. Now the process is:

1.- A request with a CSR come in from the web application.
2.- The request and the CSR go into a database
3.- The validation agent validates the information from the request, updating with the validated information
4.- The RA operator downloads from the web application a new CSR generated with:
* Public key of the original CSR sent by the customer
* Subject from validated data
5.- The RA operator issues the certificate

By the end of September we plan to have integrated the web application with the PKI environment, to be able to issue the certificate directly from the application (in case all validations are correct), and to not need an physical operator to issue the certificate.

In the case of the misissued certificate, the information validated from the request form was correct (www.barakaldo.eus). The problem was that he requested us at the same time two different certificates, one for the external domain (www.barakaldo.eus) and the other one for the internal domain (www.barakaldo.bizkaia.euskalsarea).

Now we have reduced the risk because:

  • We have two different environments for internal and external CAs
  • We require all requests to come from the web application
  • We have enabled preventive controls in the PKI software
Flags: needinfo?(o-garcia)
Attached image flowchart.png

(In reply to Oscar Garcia from comment #7)

CAA validation in these four cases was made manually. In case of this misissued certificate, the validation was made on www.barakaldo.eus, because that was the domain name in the request form. The problem was with the CSR, that was for the internal CA, exactly for www.barakaldo.euskalsarea

Now, this is very interesting. Can you describe in more detail how this process works, please?

  • How did you implement the algorithm into your RA operators? RFC6844 is a very complex RFC, considering all its innocent-looking details. How much training did you spend on this?
  • Given the failure to correctly match information from the CSR to information in the "PDF", can you be certain that your operators would spot that you are not allowed to issue certificates for domains with a CAA issue "izenpe.euskalsarea", "ize\0npe.eus", "izenpe.eus\0", "izenpe.eus.", or "izenpe.eus;;" record?
  • Can you be sure that you checked CAA records (in the most restrictive case) 8 hours before issuing? What if delays happen in the workflow? Can you be sure they never happened?

(In reply to paul.leo.steinberg from comment #9)

(In reply to Oscar Garcia from comment #7)

CAA validation in these four cases was made manually. In case of this misissued certificate, the validation was made on www.barakaldo.eus, because that was the domain name in the request form. The problem was with the CSR, that was for the internal CA, exactly for www.barakaldo.euskalsarea

Now, this is very interesting. Can you describe in more detail how this process works, please?

  • How did you implement the algorithm into your RA operators? RFC6844 is a very complex RFC, considering all its innocent-looking details. How much training did you spend on this?

I don't see the benefit for the community of knowing the length of our training. What do you consider is enough?. As indicated above we just used the manual verification in those four cases, and any of them have any CAA registry.

  • Given the failure to correctly match information from the CSR to information in the "PDF", can you be certain that your operators would spot that you are not allowed to issue certificates for domains with a CAA issue "izenpe.euskalsarea", "ize\0npe.eus", "izenpe.eus\0", "izenpe.eus.", or "izenpe.eus;;" record?

Same as above, we just used the manual verification in those four cases, and any of them have any CAA registry. For the rest of cases our customers have used our web application, where the CAA check is made automatically.

  • Can you be sure that you checked CAA records (in the most restrictive case) 8 hours before issuing? What if delays happen in the workflow? Can you be sure they never happened?

It's defined in our internal procedure that the validation agent must save an evidence of all validations, including the CAA check. Then the RA operator must verify those evidences, and in case the time between the CAA check and the issuance is bigger than 8 hours, the CAA check must be done again.
That process will change when we finish the implementation of the full web application (by the end of September), where the issuance process will be done automatically.

Like Comment #9, I have trouble understanding how the CAA check could possibly be done manually, reliably. I think the questions raised in Comment #9 are relevant to understanding what, exactly, the process being followed is, because it just seems difficult to imagine that it would be correct.

That's relevant to understanding both the root cause(s) of this issue, but also important to understanding and being confident that the automated solution, in Comment #10, is also going to produce the correct results.

Flags: needinfo?(o-garcia)

As indicated above, before we removed the posibility to send us the request form using a different path to the web application there was a validation team that checked the CAA (among all the rest of validations) using https://toolbox.googleapps.com/apps/dig/#CAA/. In 99% of cases there was not any CAA registry, the validation was correct. In the rest of cases (1%) where there was a CAA registry, it was escalated to the CSO as an internal incidence, and resolved by him. So only two people (CSO and his backup) had to know the details of CAA. Although all validations are included in the training of the validation team, so they knew how it works (generic scenarios, not something like ize\0npe.eus). Keep in mind that the business of Izenpe is to issue certificates for the Basque local administration, so the number of SSL certificates we issue each year is very limited.
All validations are registered with the time when it was done, including the CAA. The issuance operator has to verify that the CAA has been verified within those 8 hours. In case the time lapse is more than 8 hours, that request will be returned to the validation team, and the certificate will not be issued.

Now the validation is done automatically by the web application, and by the end of next week we plan to put a redundant control in our PKI software, to validate automatically the CAA before issuing it.

Flags: needinfo?(o-garcia)

As indicated above, before we removed the posibility to send us the request form using a different path to the web application there was a validation team that checked the CAA (among all the rest of validations) using https://toolbox.googleapps.com/apps/dig/#CAA/. In 99% of cases there was not any CAA registry, the validation was correct. In the rest of cases (1%) where there was a CAA registry, it was escalated to the CSO as an internal incidence, and resolved by him. So only two people (CSO and his backup) had to know the details of CAA

This is not an accurate description of how to validate CAA. There are numerous scenarios where a lookup does not return a CAA record yet the CA is still forbidden to issue. Based on this description, I believe that every certificate issued using this process is misissued and needs to be revoked along with an incident report - as was required of other CAs which issued certificates without properly following the CAA specification. This also calls into question whether Izenpe's automated CAA checking is correct.

Also, could you please provide the numbers (total number of validations, and number of validations that you believe did not have a CAA record) that you used to calculate the 99% figure?

Keep in mind that the business of Izenpe is to issue certificates for the Basque local administration,

This statement does not appear to be true, as looking through the certificates from just one intermediate reveals quite a few certificates issued to organizations in France and Iran: https://crt.sh/?Identity=%25&iCAID=89909

so the number of SSL certificates we issue each year is very limited.

This statement is definitely true, as I count just 512 certificates issued by Izenpe in 2019. That makes me wonder why Mozilla users should accept the risk of trusting Izenpe considering the low value it provides. Being a tiny CA is not an excuse for serious incidents like this one - rather, tiny CAs should have to work extra hard to demonstrate that they provide enough value relative to the risk of trusting them. I don't think Izenpe has done so.

Furthermore, this is not the first time Izenpe has issued certificates without proper domain validation.

In 2016 (which was after issuance to internal names was prohibited), Izenpe issued a certificate for a domain ending in .jaso, which is not a real TLD: https://crt.sh/?sha256=479D739873AA2C76552772E2D60BCE7268D404941D457E3287ECE9107E56BF3A

Later in 2016, Izenpe issued this certificate for test.ssl.com, which I assume was not authorized by ssl.com: https://crt.sh/?sha256=C131168DA2E0EC2D6D9C85DEB76D7CFB80CFDE840463B8F140351BBE5AAF0EC1

Flags: needinfo?(o-garcia)

A quick manual search of censys also shows a precertificate issued for an invalid TLD on the 6th July 2020 https://crt.sh/?q=7adf14b9cb56ccb11a221937593fe4f7338d650506409604a87bde6fe0d2147a

Nevermind that's the certificate in this incident, I should really learn to read more! Please ignore.

(In reply to Andrew Ayer from comment #13)

As indicated above, before we removed the posibility to send us the request form using a different path to the web application there was a validation team that checked the CAA (among all the rest of validations) using https://toolbox.googleapps.com/apps/dig/#CAA/. In 99% of cases there was not any CAA registry, the validation was correct. In the rest of cases (1%) where there was a CAA registry, it was escalated to the CSO as an internal incidence, and resolved by him. So only two people (CSO and his backup) had to know the details of CAA

This is not an accurate description of how to validate CAA. There are numerous scenarios where a lookup does not return a CAA record yet the CA is still forbidden to issue. Based on this description, I believe that every certificate issued using this process is misissued and needs to be revoked along with an incident report - as was required of other CAs which issued certificates without properly following the CAA specification. This also calls into question whether Izenpe's automated CAA checking is correct.

Also, could you please provide the numbers (total number of validations, and number of validations that you believe did not have a CAA record) that you used to calculate the 99% figure?

Thanks, Andrew. This is also my understanding and reaction.

Beyond the CAA matter you've highlighted, this is also clearly a case of using a delegated third party to perform validation - the dependency, in this case on Google, is clearly in the role of a service provider performing a portion of the functions that the CA is expected to perform.

This is a gravely serious issue that calls into question 100% of the certificates that have been issued.

Keep in mind that the business of Izenpe is to issue certificates for the Basque local administration,

This statement does not appear to be true, as looking through the certificates from just one intermediate reveals quite a few certificates issued to organizations in France and Iran: https://crt.sh/?Identity=%25&iCAID=89909

so the number of SSL certificates we issue each year is very limited.

This statement is definitely true, as I count just 512 certificates issued by Izenpe in 2019. That makes me wonder why Mozilla users should accept the risk of trusting Izenpe considering the low value it provides. Being a tiny CA is not an excuse for serious incidents like this one - rather, tiny CAs should have to work extra hard to demonstrate that they provide enough value relative to the risk of trusting them. I don't think Izenpe has done so.

Furthermore, this is not the first time Izenpe has issued certificates without proper domain validation.

In 2016 (which was after issuance to internal names was prohibited), Izenpe issued a certificate for a domain ending in .jaso, which is not a real TLD: https://crt.sh/?sha256=479D739873AA2C76552772E2D60BCE7268D404941D457E3287ECE9107E56BF3A

Later in 2016, Izenpe issued this certificate for test.ssl.com, which I assume was not authorized by ssl.com: https://crt.sh/?sha256=C131168DA2E0EC2D6D9C85DEB76D7CFB80CFDE840463B8F140351BBE5AAF0EC1

Ben: I think this is worth careful consideration here, given the severity of the failure here.

Flags: needinfo?(bwilson)

(In reply to Andrew Ayer from comment #13)

As indicated above, before we removed the posibility to send us the request form using a different path to the web application there was a validation team that checked the CAA (among all the rest of validations) using https://toolbox.googleapps.com/apps/dig/#CAA/. In 99% of cases there was not any CAA registry, the validation was correct. In the rest of cases (1%) where there was a CAA registry, it was escalated to the CSO as an internal incidence, and resolved by him. So only two people (CSO and his backup) had to know the details of CAA

This is not an accurate description of how to validate CAA. There are numerous scenarios where a lookup does not return a CAA record yet the CA is still forbidden to issue. Based on this description, I believe that every certificate issued using this process is misissued and needs to be revoked along with an incident report - as was required of other CAs which issued certificates without properly following the CAA specification. This also calls into question whether Izenpe's automated CAA checking is correct.

As indicated in Comment #3 the number of certificates validated using the manual way in last two years are four. In these four we used https://toolbox.googleapps.com/apps/dig/ to search the CAA entry. The process to validate the CAA for these four was the following:

  1. Compliance team checks recursively all domain names, looking for a CAA entry. They take an evidence of the validation. We follow this sequence:
    a. X.Y.Z
    b. CNAME/DNAME(X.Y.Z) if exists
    c. Y.Z
    d. CNAME/DNAME(Y.Z) if exists
    e. Z
    f. CNAME/DNAME(Z) if exists
  2. In case they find any CAA entry, they report to the CSO (or his backup), as an incidence. The CSO give them back the OK or NOK.
  3. If it’s OK, the request goes to the issuance operator. In case the time spent from the CAA validation is more than 8 hours, the request is sent back to the compliance team.
    All these certificates are .eus, and DNSSEC is enabled in .EUS. But not in any of the rest of child domains.

In case of our automated CAA check (that we used for the rest of certificates), we use a generic add-in provided by our PKI provider Safelayer, it’s not a specific development made for us, and we don’t use any software by a third party. The product add-in searches recursively for each domain name, looking for a CAA RR. It could happen (copy&paste from the product manual):

  • The domain doesn’t have CAA RR entries: the application recursively looks for parent domains
    o If it finds a CAA RR entry, it makes the actions described in the following sections
    o If it arrives to the top level domain without finding a CAA RR entry, It continues processing the request
  • The domain has a CAA RR entry without issue of issuewild
    o In case the domain has a CAA RR entry without issue or issuewild tags, it continues processing the request
  • Possible errors
    o Lookup errors: in case it has not any answer from the DNS server, it tries three times. On the other hand, if it gets a DNS answer with a state different to NOERROR or NSDOMAIN, it tries it again
    o Critical flags: the product denies the request and shows an error in case the CAA RR has a flag with the critical flag enabled that is not issue, issuewild of iodef
  • Processing of CNAME/DNAME y DNSSEC: The processing of the records CNAME/DNAME DNSSEC corresponds to the DNS server. As it is indicated in RFC4035, the DNS server must respond with a SERVFAIL status when the checking of DNSSEC records fails

Also, could you please provide the numbers (total number of validations, and number of validations that you believe did not have a CAA record) that you used to calculate the 99% figure?

Number cases where there was a CAA entry

  • Manual: 0% of those four cases had any CAA or CNAME/DNAME entry
  • Automatic: we are preparing a report with the result of analysing all certificates issued by Izenpe since CAA validation become mandatory in CABForum (September 8th, 2017). The objective is to have statistics about how many CAA entries we find, and in which cases is DNSSEC enabled. We spoke of 99% in a figurative sense

Keep in mind that the business of Izenpe is to issue certificates for the Basque local administration,

This statement does not appear to be true, as looking through the certificates from just one intermediate reveals quite a few certificates issued to organizations in France and Iran: https://crt.sh/?Identity=%25&iCAID=89909

Some time ago we had two specific projects to issue certificates. Since last November 15th, we have a policy to not issue SSL certificates to entities out of Spain.

so the number of SSL certificates we issue each year is very limited.

This statement is definitely true, as I count just 512 certificates issued by Izenpe in 2019. That makes me wonder why Mozilla users should accept the risk of trusting Izenpe considering the low value it provides. Being a tiny CA is not an excuse for serious incidents like this one - rather, tiny CAs should have to work extra hard to demonstrate that they provide enough value relative to the risk of trusting them. I don't think Izenpe has done so.

Izenpe is a small company created to give some services to the Basque Public Administration, including SSL/TLS certificates. Our resources are not the same as the big ones, but logically we have the same requirements. Since the beginning, we have worked really hard to meet all requirements from CABForum, European legislation and national legislation. Although it’s something that it’s supposed to happen, each time we have found an issue we’ve reported to Bugzilla, in that sense we’ve been fully transparent.

Furthermore, this is not the first time Izenpe has issued certificates without proper domain validation.

In 2016 (which was after issuance to internal names was prohibited), Izenpe issued a certificate for a domain ending in .jaso, which is not a real TLD: https://crt.sh/?sha256=479D739873AA2C76552772E2D60BCE7268D404941D457E3287ECE9107E56BF3A

Later in 2016, Izenpe issued this certificate for test.ssl.com, which I assume was not authorized by ssl.com: https://crt.sh/?sha256=C131168DA2E0EC2D6D9C85DEB76D7CFB80CFDE840463B8F140351BBE5AAF0EC1

We doing our best to not repeat our mistakes. An evidence of so is that we’ve added some controls to our PKI software, and we’re in a project to to integrate our web application into our PKI system to reduce the risk of manual errors. It’ll be in production by the end of September

Flags: needinfo?(o-garcia)

Ryan, you're right in the sense that we've used the Google dig online service to search CAA/CNAME entries. Therefore that online service would be a third party, so we've revoked those four affected certificates, you can check it.

www.barakaldo.eus https://crt.sh/?id=3052537874
www.zestoa.eus https://crt.sh/?id=2974182442
www.aulesti.eus https://crt.sh/?id=1665344033
www.astigarraga.eus https://crt.sh/?id=2964904131

All those customers have been notified.
Thanks

Oscar: 100% of certificates you checked using the algorithm described in Comment #17 are misissued. It's not just the ones you positively checked for CAA - the use of a third-party service to check any of the DNS records is, itself, the use of a delegated third-party.

There seems to have been a misunderstanding with my explanation of comment #17, I will try to give a more accurate explanation.
CAA check is made with the PKI software provided by our recognized software provider, Safelayer. It provides us the PKI software, as well the add-in we are using in the CA server, which connects directly to the DNS server of the client, which contains the dns name. It works this way (taken from the product manual):
• The domain doesn’t have CAA RR entries: the application recursively looks for parent domains
o If it finds a CAA RR entry, it makes the actions described in the following sections
o If it arrives to the top level domain without finding a CAA RR entry, It continues processing the request
• The domain has a CAA RR entry without issue of issuewild
o In case the domain has a CAA RR entry without issue or issuewild tags, it continues processing the request
• Possible errors
o Lookup errors: in case it has not any answer from the DNS server, it tries three times. On the other hand, if it gets a DNS answer with a state different to NOERROR or NSDOMAIN, it tries it again
o Critical flags: the product denies the request and shows an error in case the CAA RR has a flag with the critical flag enabled that is not issue, issuewild of iodef
• Processing of CNAME/DNAME y DNSSEC: The processing of the records CNAME/DNAME DNSSEC corresponds to the DNS server. As it is indicated in RFC4035, the DNS server must respond with a SERVFAIL status when the checking of DNSSEC records fails
We do not use any delegated party of third party software to validate those CAA entries, we check directly with the DNS server.

On the other hand, until last July 7th we had an alternative option to the web application as a contingency, for cases in which for some technical reason the client was unable to access the application. In those cases the validation and issue process is:
1.- The validation team receives the email. That email encloses the signed PDF with the request form and the CSR
2.- The validation team verifies all information in the request form from trusted sources, according to BRs/EVGs. Evidences for each validation are registered in a document, which is digitally signed by the validation operator. In those cases, use used https://toolbox.googleapps.com/apps/dig/ as a tool to search the CAA entry. The process to validate the CAA for these four was the following:

  1. Compliance team checks recursively all domain names, looking for a CAA entry. They take an evidence of the validation. We follow this sequence:
    a. X.Y.Z
    b. CNAME/DNAME(X.Y.Z) if exists
    c. Y.Z
    d. CNAME/DNAME(Y.Z) if exists
    e. Z
    f. CNAME/DNAME(Z) if exists
  2. In case they find any CAA entry, they report to the CSO (or his backup), as an incidence. The CSO give them back the OK or NOK.
  3. If it’s OK, the request goes to the issuance operator. In case the time spent from the CAA validation is more than 8 hours, the request is sent back to the compliance team.
    All these certificates are .eus, and DNSSEC is enabled in .EUS. But not in any of the rest of child domains.

We processed the four certificates in comment #3 using this alternative way. We considered the Google dig a trusted source to search the CAA entries, but you are right that we did not realized we were using a third party software to check the DNS records; therefore, we have revoked those four certificates where we used a manual validation, and we don’t longer allow that way to request certificates.

Hi Oscar,
In Comment #17 you state, "we’re in a project to to integrate our web application into our PKI system to reduce the risk of manual errors. It’ll be in production by the end of September". How is that project coming along? What are some of the controls that will be implemented to reduce manual errors?
Thanks,
Ben

Flags: needinfo?(bwilson) → needinfo?(o-garcia)

Hi Ben, we expect to keep the deadline, to have in production the integration of the web application with our PKI system. Therefore we'll have it by next September 30th. With this integration the issuance operator won't have to access to a repository to get the CSR to issue, we will just push a button (obviously after succesfully authenticated using two factor ID). The process will be:

  • Application form and CSR is received from the web application
  • Authomatic validations are made. In case there is any incidence with the validacion, it's forwarded to the validation team
  • In all cases all information is verified by the validation team. In case it's a Qualified/EV certificate validation is approved by two people
  • A new CSR is generated authomatically by the web application, taking the validated information and the public key from the CSR provided by the customer
  • An operator from the issuance team (different from the validation team) checks verified information is the same as the one contained in the final CSR. If it's correct, he pushes a button to issue the certificate. At that moment the CAA check is automatically done by the PKI system.

The difference with the current scenario is that the issuance operator won't have to manually select any CSR, so the risk to choose an incorrect CSR is reduced.

Flags: needinfo?(o-garcia)
Whiteboard: [ca-compliance] → [ca-compliance] Next Update 1-October-2020

Hi Ben, sorry. We have had a delay in our process of deploying and testing the application. We have agreed with our software developer as the end date on October 19

We have had a delay in our process of deploying and testing the application

While useful to understand that there was a delay, can you be more descriptive about the cause of the delay, and when it was known? I'm trying to better understand what changed between Comment #22 and Comment #23. In general, we want to ensure information is shared as it's become known.

In particular, I want to make sure that the October 19 date mentioned in Comment #23 doesn't further slip.

Flags: needinfo?(o-garcia)

Hi Ryan,
We were meeting our milestones, but when we tested the changes in our development environment, we found some issues that needed the help of our PKI software manufacturer. That delayed the deployment in production, but now those issues are fixed, and we're sure that it'll by next October 19th.

Flags: needinfo?(o-garcia)

(In reply to Oscar Garcia from comment #25)

Hi Ryan,
We were meeting our milestones, but when we tested the changes in our development environment, we found some issues that needed the help of our PKI software manufacturer. That delayed the deployment in production, but now those issues are fixed, and we're sure that it'll by next October 19th.

Oscar, while it's useful to know that you "found some issues", can you be more descriptive about the nature of the issues, and what happened?

Every incident report serves several purposes:

  • To help the community understand what went wrong
  • To help the community understand how to prevent similar issues
  • To help the community understand whether or not the CA is or remains trustworthy

To better set expectations, we're looking to ensure that every incident is as sufficiently detailed as the DigiNotar Report, because it's those details that help us build robust systems. So I'm hoping you can be both precise and descriptive, recognizing there's no level of detail that is too great to share, in communicating the delays and issues being faced here.

Flags: needinfo?(o-garcia)

Ryan, it's not a security incident, nor a threat. The thing is that our PKI system is not designed to be integrated with third party entities (obviously, it's a software designed from the security point of view, it's a CC EAL4+ certified product), so the manufacturer had to give us an interface to integrate our web application with it. The problem we had in this case was that there was a mistake in some definitions in the developer's document, more exactly with a parameter in the workflow to issue our "Sede Cualificado" profile.

Finally we have the correct definition, so we could finish the integration in our test environment. We've almost finished all tests, so we hope to have it in production by the end of next week.
Thanks

Flags: needinfo?(o-garcia)

We already have in production environment all changes to integrate the web application with the PKI system. We have developed two applications; one for the customer, where all requests are registered, and the other one for Izenpe operators. The internal frontend is able to issue the certificate once all validations are correct. A qualified certificate with two authentication factors are required to access to both applications.
The system manages all evidences in the life-cycle of each certificate.
This change also includes that all validations are made automatically, so we've reduced the risk of human mistakes.

Thanks

Whiteboard: [ca-compliance] Next Update 1-October-2020 → [ca-compliance] Next Update 2020-12-01

Will the system prevent this type of internal name issuance in the future? Are there any further updates, does the completion of the integration project mean that the remediation of this incident complete?

Flags: needinfo?(o-garcia)

The only way to request a certificate is using our web application, and it's fully integrated with the PKI software to be able to issue the certificate. We currently have controls at different layers.

Preventive controls:

  • The web application has a filter to check all dnsNames are external domains when the customer tries to fill a request form. In case it's not in the ICANN's TLD list, we don't allow him to continue
  • All domain domain authorization or control validations and the certificate issuance are managed automatically by the web application. A certificate can't be issued if it hasn't first verified the domain control using any of the ways allowed in our policy (3.2.2.4.2, 3.2.2.4.6, 3.2.2.4.7). In case there's an internal dnsName, these validations will fail, not allowing the issuance.
  • The CSR uploaded by the customer is rebuilt by the application. We just keep the public key, and rebuild with the validated fields from the application form.
  • The PKI software also verifies the dnsNames, to check among other issues that any of them is an internal domain. This way we have two different software reviewing the same issue.
  • The PKI software doesn't allow to issue a SSL certificate directly from the RA console, the request must come from the web application.
  • We have a different environment for internal certificates, and a different channel to request them
  • In case it's a qualified certificate, following the EV guidelines, it must first be verified by two different operators before issuing it.

Detective controls:

  • An email with the result of 3 different linter check is sent to all members of a trained validation team and to the manager, for any SSL certificate issued

With these controls we think that we have reduced the risk to issue an internal domain to the minimum.

Flags: needinfo?(o-garcia)
Flags: needinfo?(bwilson)

I am inclined to close this matter as having been adequately addressed by Izenpe. I will close this on or about 22-Jan-2021 unless we receive additional questions, comments or issues that still need to be addressed.

Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.