Closed Bug 1390974 Opened 2 years ago Closed 2 years ago

Actalis: Non-BR-Compliant Certificate Issuance

Categories

(NSS :: CA Certificate Compliance, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kwilson, Assigned: adriano.santoni)

References

Details

(Whiteboard: [ca-compliance])

The following problems have been found in certificates issued by your CA, and reported in the mozilla.dev.security.policy forum. Direct links to those discussions are provided for your convenience.

To continue inclusion of your CA’s root certificates in Mozilla’s Root Store, you must respond in this bug to provide the following information:
1) How your CA first became aware of the problems listed below (e.g. via a Problem Report, via the discussion in mozilla.dev.security.policy, or via this Bugzilla Bug), and the date.
2) Prompt confirmation that your CA has stopped issuing TLS/SSL certificates with the problems listed below.
3) Complete list of certificates that your CA finds with each of the listed issues during the remediation process. The recommended way to handle this is to ensure each certificate is logged to CT and then attach a CSV file/spreadsheet of the fingerprints or crt.sh IDs, with one list per distinct problem.
4) Summary of the problematic certificates. For each problem listed below: number of certs, date first and last certs with that problem were issued.
5) Explanation about how and why the mistakes were made, and not caught and fixed earlier.
6) List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
7) Regular updates to confirm when those steps have been completed.

Note Section 4.9.1.1 of the CA/Browser Forum’s Baseline Requirements, which states:
“The CA SHALL revoke a Certificate within 24 hours if one or more of the following occurs: …
9. The CA is made aware that the Certificate was not issued in accordance with these Requirements or the CA’s Certificate Policy or Certification Practice Statement; 
10. The CA determines that any of the information appearing in the Certificate is inaccurate or misleading; …
14. Revocation is required by the CA’s Certificate Policy and/or Certification Practice Statement; or 
15. The technical content or format of the Certificate presents an unacceptable risk to Application Software Suppliers or Relying Parties (e.g. the CA/Browser Forum might determine that a deprecated cryptographic/signature algorithm or key size presents an unacceptable risk and that such Certificates should be revoked and replaced by CAs within a given period of time).

However, it is not our intent to introduce additional problems by forcing the immediate revocation of certificates that are not BR compliant when they do not pose an urgent security concern. Therefore, we request that your CA perform careful analysis of the situation. If there is justification to not revoke the problematic certificates, then explain those reasons and provide a timeline for when the bulks of the certificates will expire or be revoked/replaced. 

We expect that your forthcoming audit statements will indicate the findings of these problems. If your CA will not be revoking the certificates within 24 hours in accordance with the BRs, then that will also need to be listed as a finding in your CA’s BR audit statement.

We expect that your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable. If your CA will not be revoking the problematic certificates as required by the BRs, then we recommend that you also contact the other root programs that your CA participates in to acknowledge this non-compliance and discuss what expectations their Root Programs have with respect to these certificates.


The problems reported for your CA in the mozilla.dev.security.policy forum are as follows:

** Failure to respond within 24 hours after Problem Report submitted
https://groups.google.com/d/msg/mozilla.dev.security.policy/PrsDfS8AMEk/w2AMK81jAQAJ
The problems were reported via your CA’s Problem Reporting Mechanism as listed here:
https://ccadb-public.secure.force.com/mozilla/CAInformationReport
Therefore, if this is the first time you have received notice of the problem(s) listed below, please review and fix your CA’s Problem Reporting Mechanism to ensure that it will work the next time someone reports a problem like this.

** Invalid dnsNames (e.g. invalid characters, internal names, and wildcards in the wrong position)
https://groups.google.com/d/msg/mozilla.dev.security.policy/CfyeeybBz9c/lmmUT4x2CAAJ
https://groups.google.com/d/msg/mozilla.dev.security.policy/D0poUHqiYMw/Pf5p0kB7CAAJ
The Primary Point of Contact for this CA replied that they are on vacation with limited internet access. I agreed that this bug can wait until after they return from vacation.
(In reply to Kathleen Wilson from comment #1)
> The Primary Point of Contact for this CA replied that they are on vacation
> with limited internet access. I agreed that this bug can wait until after
> they return from vacation.

Can you clarify what specifically can wait? For example, the Primary PoC being on vacation does not seem like it releases the CA from their duty to revoke the misissued certificate within 24 hours of learning about it.
(In reply to Kathleen Wilson from comment #1)
> The Primary Point of Contact for this CA replied that they are on vacation
> with limited internet access. I agreed that this bug can wait until after
> they return from vacation.

I disagree that addressing internal server name misissuances can wait.  Issuing for an internal server name is equivalent to issuing for a non-internal server name without proper domain validation.  Other people could be using the same internal server names on their own networks, with certificates from a private CA, under the reasonable assumption that the public CAs in the Mozilla store won't issue for those names.  They would now be at risk from the certificates misissued by Actalis.

Even if the certificates logged to CT are for fairly distinct internal server names, there may be unlogged certificates for more common internal server names such as "mail", which is why not just revocation, but also a thorough investigation, needs to be completed as soon as possible.
(In reply to Andrew Ayer from comment #3)
> (In reply to Kathleen Wilson from comment #1)
> > The Primary Point of Contact for this CA replied that they are on vacation
> > with limited internet access. I agreed that this bug can wait until after
> > they return from vacation.
> 
> I disagree that addressing internal server name misissuances can wait. 
> Issuing for an internal server name is equivalent to issuing for a
> non-internal server name without proper domain validation.  Other people
> could be using the same internal server names on their own networks, with
> certificates from a private CA, under the reasonable assumption that the
> public CAs in the Mozilla store won't issue for those names.  They would now
> be at risk from the certificates misissued by Actalis.
> 
> Even if the certificates logged to CT are for fairly distinct internal
> server names, there may be unlogged certificates for more common internal
> server names such as "mail", which is why not just revocation, but also a
> thorough investigation, needs to be completed as soon as possible.

OK. I will let the CA POC know that they need to delegate this to someone who is currently in the office.
We became aware of the problem regarding that particular certificate after receiving a problem report from Jonathan Rudenberg, on August 13rd. Although we do not commit to monitor the particular reporting channel used by Rudenberg on a 7x24 basis (see our CPS and website), and despite the especially unfavorable time (mid August, when most of the people in Italy is on vacation), we immediately took charge of the problem and started investigating on it, following our internal incident handling procedure. We found that the certificate contains some internal names. Although deprecated, that was still permitted at the time when the certificate was issued (September 2015). That certificate should have been revoked on October 2016, as prescribed by the BRs, but for some reasons (yet to be ascertained) it wasn't.
On the same day, August 13, we took contact with the customer of that certificate, namely the well known Italian oil company ENI, to explain the situation. ENI told us that they cannot replace that certificate on such a small notice, because they have to do some developments before, and that revoking the current certificate would disrupt their operations in some abroad sites. They committed to replace the current certificate on September 14, so we will revoke that certificate immediately afterwards. Meanwhile, we are also investigating internally to find out if any other certificate with this defect is still active; we expect to have the full picture in few days. We expect them to be very few, as in late 2015 we adopted technical controls to prevent issuance of certificates containing internal names.

I will provide further information and answers to questions 1 to 7 in the following few days.
(In reply to ADRIANO SANTONI from comment #5)
> ENI told us that they cannot replace that certificate on such a
> small notice, because they have to do some developments before, and that
> revoking the current certificate would disrupt their operations in some
> abroad sites. They committed to replace the current certificate on September
> 14, so we will revoke that certificate immediately afterwards.

For the reasons enumerated by Andrew in comment 3, I don't believe that there is sufficient justification for ignoring the 24-hour revocation rule in this case. This certificate should have been revoked on August 14th at the latest, and it and all others with a similar issue that you find should be revoked immediately.
I would concur with Comment #6 and Comment #3, and believe there should be a need for timely revocation.

Adriano, when can we expect a follow-up per Comment #5?
Here's the follow-up to comment #5, replying to the 7 original questions:

1) How your CA first became aware of the problems listed below (e.g. via a Problem Report, via the discussion in mozilla.dev.security.policy, or via this Bugzilla Bug), and the date.

We first became aware of the problem on receiving an email from Jonathan Rudenberg, on August 13rd, sent to our certificate problem reporting mailbox, as already described above.

2) Prompt confirmation that your CA has stopped issuing TLS/SSL certificates with the problems listed below.

We stopped issuing certificates with internal names in November 2015, as prescribed by the BRs.

3) Complete list of certificates that your CA finds with each of the listed issues during the remediation process. The recommended way to handle this is to ensure each certificate is logged to CT and then attach a CSV file/spreadsheet of the fingerprints or crt.sh IDs, with one list per distinct problem.

We found none, apart from the single certificate (already logged to CT) reported by Jonathan. We scanned the entire corpus of certificates that we issued, using both our internal tools and CABLINT, and did not found any other active certificates with internal names in them.

4) Summary of the problematic certificates. For each problem listed below: number of certs, date first and last certs with that problem were issued.

Just the one reported by Jonathan, and no other.

5) Explanation about how and why the mistakes were made, and not caught and fixed earlier.

We did not to revoke that certificate within October 2016, as it was required by the BRs. Based on our investigations, it seems that this was caused by human error in configuring a software tool of ours. We have a scheduled job that periodically scans our database in search of any possible non-compliant certificates. This check is done via a tool (sort of linter) that applies rules defined in a configuration file. We discovered a subtle configuration error that likely prevented the certificate in discussion (the one reported by Jonathan) from being detected.

6) List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

We decided to also use an additional linter in performing our BR-compliance checks (see question 5); in particular we decided to use cablint. We plan to implement this within end of September.

Apart from that, since November 2015 both our certificate management platform and our operating procedures prevent certificate requests containing internal names from being accepted. In fact, we did not issue any certificates with internal names after then, as far as we can tell. And with reference to comment #3, we never issued any certificate with generic unqualified names such as "mail" or the likes.

7) Regular updates to confirm when those steps have been completed.

Will report here on our progress with the remedial action described above.
(In reply to ADRIANO SANTONI from comment #8)
> We found none, apart from the single certificate (already logged to CT)
> reported by Jonathan. We scanned the entire corpus of certificates that we
> issued, using both our internal tools and CABLINT, and did not found any
> other active certificates with internal names in them.

Did you find any other error-level messages from cablint that were triggered by other certificates?
(In reply to ADRIANO SANTONI from comment #8)
> Here's the follow-up to comment #5, replying to the 7 original questions:

Thanks for these details. It is encouraging to see a description of the controls you already had in place that looked for and anticipated problems, and while these controls failed, it sounds like the remediation will partly address this.

Some follow-ups:
- https://crt.sh/?id=11354982&opt=cablint does not appear revoked yet. Per Comment #7, Comment #6, and Comment #3, delays in revocation for this particular issue pose risk to the ecosystem, and thus are not acceptable. (In reply to ADRIANO SANTONI from comment #8)

> And with
> reference to comment #3, we never issued any certificate with generic
> unqualified names such as "mail" or the likes.

- https://crt.sh/?id=11354982&opt=cablint shows this is not correct, as the certificate contains 6 generic, unqualified names:
NASV0095, NASV0096, NASV0402, NASV0404, NASV0120, NASV0122 . Can you please clarify?

>  We discovered a subtle configuration error that likely prevented the certificate in discussion (the one reported by Jonathan) from being detected.

In terms of remediation, what changes are you planning that would help detect such configuration errors in the future? For example, have you developed a testing methodology to ensure changes are correct and enforced? Have you developed a 'configuration validator' tool to make sure it's properly configured? A process of "change review" to ensure multiple eyes validate the correctness of the configuration?

This is not attempting to ascribe blame, but understand what systemic issues allowed for a malformed configuration, and how improvements can be made to prevent that.
Flags: needinfo?(adriano.santoni)
As to the revocation of the certificate in discussion: that certificate cannot be revoked right now, because it is necessary for the operation of the centralized company communications platform of ENI's Nigerian subsidiary NAOC (https://www.eni.com/en_NG/eni-in-nigeria/eni-profile/eni-profile.shtml), based on Microsoft Exchange ActiveSynch Server - an architecture built directly by Microsoft for ENI. Revoking the current certificate, before NAOC reconfigured their systems so that they can install a new certificate without internal names, would cause the sudden interruption of all intra-corporate communications of NAOC, resulting in complete disruption of their operations and negative impacts even on the safety of field personnel, as in that country much of the work is carried out by staff from remote sites (oil fields). As already mentioned, ENI is committed to resolving the situation by September 14 and, if necessary, can provide us with a formal commitment signed by their CISO. Those times depend on the need to make changes to networking configurations and ActiveSynch server configuration, activities for which NAOC is not autonomous and must be supported by Microsoft Exchange ActiveSynch experts. Actalis is in constant contact with ENI, on this issue, since August 13th, and is monitoring the progress on a day-to-day basis.

Regarding our remediation actions (apart from the adoption of CABLINT), I will update you shortly. We have always been following our ISO 9001 + ISO 27001 compliant company procedures for testing, deploying, and change management, and we are still unsure at what stage and how that problem (the subtle configuration error that I mentioned in comment #8) arised. We are talking of software deployed in early 2015, and since then some developers and system operators left our company, and new staff were hired, so it's not a very quick and easy investigation.

With reference to "generic unqualified names", what I wanted to say is just that we never issued certificates with very common, user-friendly and easily re-usable internal names such as "mail" or "owa" or the likes. But that changes nothing, essentially. Inserting internal names in publicly trusted certificates was a risky practice nonetheless. That's precisely why we seldom did that, and in Nov 2015 we stopped as required by the BRs.
(In reply to ADRIANO SANTONI from comment #11)
> Inserting internal names in publicly trusted
> certificates was a risky practice nonetheless. That's precisely why we
> seldom did that, and in Nov 2015 we stopped as required by the BRs.

From the very first version of the Baseline Requirements, CAs have been required to revoke all such certificates by November 2016. Thus, this requirement should not come as a surprise to Actalis or NAOC.

1) What steps did Actalis take to ensure compliance with these requirements, beyond a (misconfigured) tool?
  a) Did you test such tool?
  b) Did you design any other mitigations/controls?
  c) Did you manually determine how many such certificates you had issued?
2) What proactive communication did Actalis take to reach out to affected customers prior to November 2016? That is, given your explanation of your configured tool, this should have already been something you were proactively communicating to customers such as NAOC, as you stated you knew compliance was required.

Understandably, Actalis made multiple serious mistakes (misconfiguring the tool, failure to detect the misconfiguration until publicly tested, failure to revoke as required), but it's deeply surprising to hear Actalis represent that three additional weeks in total are required for this customer to reconfigure their network, given that there should have been multiple proactive communications to prepare and transition this customer. Understanding a timeline of how this incident was handled, from the deployment of the BRs to present, is critical to understanding and assessing Actalis' ability to be trusted to follow the BRs going forward.
> >From the very first version of the Baseline Requirements, CAs have been
> required to revoke all such certificates by November 2016. Thus, this
> requirement should not come as a surprise to Actalis or NAOC.

We and our customers have always been aware of the requirement to revoke all such certificates by November 2016. 
It's just this single certificate that unfortunately escaped our attention because of a human error.

> 1) What steps did Actalis take to ensure compliance with these requirements, beyond a (misconfigured) tool?
>  a) Did you test such tool?

Of course we did, in keeping with our internal software development procedures which are ISO 9001 compliant. 
Regrettably, an unfortunate combination of test data and a subtle tool configuration error caused this one particular 
certificate not to be detected in our production environment, although the "internal server names" and other 
non-compliance detection functionalities were successfully verified in our test environment.

>  b) Did you design any other mitigations/controls?
 
In addition to the above mentioned tool, we did implement two more controls:
- a software control such that, since November 2015, our CA refuses certificate requests containing internal names;
- a procedural control: our SSL certificate issuance operating procedure has always expressly prohibited issuing certificates with internal names past November 2015.

>  c) Did you manually determine how many such certificates you had issued?

Yes, we did. However, we used an interactive interface that was based upon the same checking tool, so the specific certificate in discussion was not detected. Our tool detected a total of 7 certificates with internal names, all issued to the same customer (ENI), all of which were due to expire before November 2016 and therefore did not require a pre-emptive revocation. This figure is confirmed by our recent scans made with cablint.

> 2) What proactive communication did Actalis take to reach out to affected
> customers prior to November 2016? That is, given your explanation of your
> configured tool, this should have already been something you were proactively
> communicating to customers such as NAOC, as you stated you knew compliance was
> required.

We have been informing our customers in several ways:
- by specifying the rules on internal names in our CPS, in keeping with the BRs;
- by publishing a warning message on our web site, in keeping with the BRs;
- by immediately contacting the affected customers, as per our internal procedure, upon detecting, or being informed of, a non-compliant certificate.

On the specific case, since the only affected customer was ENI, we also warned ENI in writing (via email) that we would stop issuing certificates with internal names in Nov 2015 and we would revoke such certificates by Nov 2016. ENI acknowledged our communications on such topic. In practice, we did not have to revoke any ENI certificates with internal names in 2016 because all 7 of them were due to expire before Nov 2016 -- the only exception being the single certificate in discussion which, for the reasons explained above, unfortunately we did not detect.

> Understandably, Actalis made multiple serious mistakes (misconfiguring the
> tool, failure to detect the misconfiguration until publicly tested, failure to
> revoke as required), but it's deeply surprising to hear Actalis represent that
> three additional weeks in total are required for this customer to reconfigure
> their network, given that there should have been multiple proactive
> communications to prepare and transition this customer. 
 
We have been strongly urging ENI to put more resources on that task so to complete it much more quickly. 
Today ENI has committed in writing to replace the offending certificate by September 2nd, so we will revoke
that certificate by September 2nd (this Saturday) end of business.
 
Since August 13rd, when we received the problem report from J. Rudenberg, we have been handling this
issue according to our security incident handling procedure (in keeping with our ISO 27001-compliant ISMS). 

Additional remediations and measures:
- we are implementing a new BR-compliance checking system, based on cablint, the first release of which is forecasted to be deployed by mid September;
- we will also study the feasibility of calling the linter before issuing certificates;
- we will revise our test book and software testing procedures in order to enhance them.
This morning we revoked the offending certificate.
Update:
- we have deployed, ahead of time, our new certificate compliance checking and alerting system based on cablint;
- we have asked our auditor to inspect our handling of the incident and report on it in their audit statement, 
  published at https://www.actalis.it/documenti-en/actalisca_audit_statement.pdf.
Flags: needinfo?(adriano.santoni)
Summarizing the matter as:

1) Invalid DNS names (internal server names)
 - See Comment #0, Comment #5, Comment #8, Comment #11, Comment #13, Comment #15
 - Root Cause
  - 2016-10-XX - Attempted to run tool to detect internal server names to check. However, tool had a bug and failed to detect outstanding certificate (See Comment #8)
 - Remediation
  - 2015-11-XX - Ceased issuing internal server names (See Comment #8)
  - 2017-09-02 - Revoked certificate (See Comment #14)
  - 2017-09-05 - Deployed new compliance checking and alerting system (post-issuance) (See Comment #13, Comment #15)

Is this correct?

As to the root cause, I am surprised to see the auditor attest that steps have been taken to remedy the root cause. What does Actalis feel the root cause was, and how has that been remediated?
Flags: needinfo?(adriano.santoni)
Your summary is correct.

The root cause was a bug - caused by human error - in our previous compliance checking software, which prevented that particular certificate from being detected; consequently our staff did not receive an alert for that certificate at the right time, and so they did not solicit the customer to replace that certificate. That has been remediated by deploying (after carefully testing) a new compliance checking task, based on cablint. We have also put a new specific monitoring in place, with automatic alerting to several people in case of anomalies, to make sure that our new compliance checking task is always active and is continuously doing its job. 

Our system checks compliance post-issuance, but we have several controls in our CA and RA platforms aimed at preventing issuance of incorrect certificates; therefore, overall, we believe to have implemented sufficient countermeasures for now. At any rate, as I anticipated in Comment #13, we are studying the feasibility of calling the linter before issuing certificates, and will endeavor to do so at the earliest.
Hi Adriano,

(In reply to ADRIANO SANTONI from comment #5)
> We became aware of the problem regarding that particular certificate after
> receiving a problem report from Jonathan Rudenberg, on August 13rd. Although
> we do not commit to monitor the particular reporting channel used by
> Rudenberg on a 7x24 basis (see our CPS and website),

Reading, Actalis' website:
https://www.actalis.it/products/ssl-certificate.aspx 
it seems that they provide an email address for working hours only, and an Italian phone number for 24/7 reporting. While this is currently not, as far as I am aware, against any policy, Actalis should consider that it's not always easy for people to make international phone calls, and that perhaps a 24/7-monitored email address might be more user-friendly.

Gerv
Thanks for the update.

At this time, I'm going to mark this issue as Resolved, as all proposed mitigations have been met and deployed.
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(adriano.santoni)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.