Closed Bug 1620561 Opened 4 years ago Closed 3 years ago

Sectigo: Non-revocation of certificates with subject:organizationalUnitName in DV certificates

Categories

(CA Program :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Robin.Alden, Assigned: nick)

References

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36

Expected results:

As discussed in bug #1593776, Sectigo issued a large number of DV SSL certificates which included subject:organizationalUnitName fields

We believe this to be against the wider interpretation of section 7.1.4.2.2(i) of the CA/B Forum's Baseline Requirements and for that reason we changed our practice so that we no longer include the subject:organizationalUnitName field at all in DV certificates that we issue.

As requested in https://bugzilla.mozilla.org/show_bug.cgi?id=1593776#c7 we are opening this bug regarding Sectigo's decision not to revoke these certificates.

Following the outline from https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation:

"If your CA will not be revoking the certificates within the time period required by the BRs, our expectations are that:

  • The decision and rationale for delaying revocation will be disclosed to Mozilla in the form of a preliminary incident report immediately; preferably before the BR mandated revocation deadline. The rationale must include an explanation for why the situation is exceptional."

As we mentioned in bug #1593776, we think that the language of 7.1.4.2.2.i of the BRs is regrettably unclear in its wording and as written arguably forbids the presence of any OU field value at all in almost any issued certificate, contrary to what we believe to be the intent of that section.
While we acknowledge the requirement to comply with that and with all of the other sections of the BRs that are in scope for our operation, this section of the BRs was so poorly drafted from the initial version of the BRs as to be almost incomprehensible and having been subsequently accidentally further mangled by a well intentioned ballot this section, if taken at its face value, would require the revocation of about 10% of all TLS certificates, including 65% of all EV certificates.
We do not believe that it was the intention of the drafters of the original version of this section, nor of those who balloted to update this section, to forbid the use of OU fields.

"* Any decision to not comply with the timeline specified in the Baseline Requirements must also be accompanied by a clear timeline describing if and when the problematic certificates will be revoked or expire naturally, and supported by the rationale to delay revocation."

For the immediate term our intent is to let these certificates expire naturally.
We have undertaken to work within the validation subcommittee of the Server Certificate Working Group of the CA/B Forum to carry out an initial analysis and initiate further discussion of the use cases of the OU field so that the potential requirements may be enumerated to allow a decision to be made about what are the intended or permitted purposes of OU fields with a view to bringing a considered ballot to introduce clear language for the BRs that sets out what is and what is not permitted to be included in OU fields for (a) DV certificates and (b) OV and EV certificates.

When the requirements are comprehensible we will follow them.

"* That you will perform an analysis to determine the factors that prevented timely revocation of the certificates, and include a set of remediation actions in the final incident report that aim to prevent future revocation delays."

The factors here that prevented practical revocation of these certificates were
a) The BRs require us to revoke where certificates are not issued in accordance with these Baseline Requirements.
b) We firmly believe that it is solely due to a misdrafting of 7.1.4.2.2.i that the use of subject:OrganizationalUnitName attributes are inadvertently effectively forbidden by requiring the presence of all three of subject:organizationName, subject:givenName, and subject:surname to also be present before a subject:OrganizationalUnitName may be included. This combination of subject attributes does not arise in a validly issued Server Certificate.
c) Since this small section of the requirements is incomprehensible in the context of the current (and historic) web PKI we cannot evaluate whether the issued certificates are in accordance with the incomprehensible and we therefore do not feel bound to revoke in this instance. We regard these as truly exceptional circumstances requiring amendment of the BRs.

Also from bug #1593776:
This incident and other high visibility incidents in 2019 (and 2020) have illustrated how a single codified level of response for all certificates containing BR violations can lead to outcomes that are suboptimal for both relying parties and subscribers. Sectigo plans to drive dialog around establishing categories of BR violation, with appropriate codified responses for each. Our ultimate goal will be to advance a ballot to put these categories into effect.

Assignee: wthayer → Robin.Alden
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [delayed-revocation-leaf]

Robin:

Thanks for providing this. If I can attempt to summarize the rationale here:

  1. We don’t think the BR requirement makes sense
  2. We don’t want to revoke (e.g. the statement of “intent”)
  3. We don’t think we should have to revoke (e.g. the statement, which has been repeatedly rejected on m.d.s.p., of treating levels of incidents)

As a response, this is an incredibly disappointing response. It lacks both qualitative and quantitative data, and seems to deeply reflect Sectigo’s seeming increased apathy towards compliance and incident response. Compare this response time other CAs, and you can see just how little Sectigo is actually doing or saying here.

My biggest concern is this: this incident gives me zero confidence that Sectigo is prepared or capable to replace its certificates if and when needed. There seems little in the line of steps Sectigo is taking to actually ensure they can and will abide by the BRs going forward, and instead a seeming suggestion that the correct answer is retroactive indulgences and accepting whatever the CA feels is “not a security issue”.

I am deeply concerned about the number of incidents Sectigo has had, and the poor quality of these incident reports, as they show a lack of efforts by the CA to address compliance systemically, and instead seem to paper things over as human error, confusion, and shrugs. Sectigo can and should be doing better.

I want to encourage you to revisit this report, the timeline for when these certificates will be revoked, and the steps Sectigo will be taking to ensure that it can and will promptly revoke non-compliant certificates going forward. For example, a substantial reduction in lifetime, a switch to total automation for all issuance, or some other change that Sectigo feels addressed the root reasons why it believes the Baseline Requirements are merely suggestions to pick and choose from.

Flags: needinfo?(Robin.Alden)
Type: defect → task

Ryan,
We will revisit this report and further evaluate our response.
There is some useful data we can present about the body of certificates affected and how different variations or aspects of this issue present themselves in the certificates. Also, which certificates have already been revoked and which certificates are known to no longer be in use. It will take us a few days to complete the analysis and present it here.

Sorry for the slow reply again.

We are currently still working through the list of certificates with a view to have them reissued and subsequently revoked.
To date (May-18-2020): 4,679,280 have expired.
5,571,744 will be expired by the end of June.
102,123 certificates have already been replaced and revoked.

A number of the certificates were issued through high-volume customers.
One customer is aiming to have approximately 300,000 certificates replaced and revoked within the next 3 weeks.

Another customer is working to provide a list of revocable certificates that should approximate 700,000 to 1,000,000.
However, around 3 million for the same customer are deployed on consumer devices and the these devices have no way to automatically update or refresh a certificate, prior to the 'renewal' processes which take place a few days before the natural expiry of the certificate.
Revoking those certificates will have a significant detrimental impact on end users which cannot be resolved by the end-user or the vendor.
We are continuing to work with this customer, and others, to increase the number that will be revoked.

However, around 3 million for the same customer are deployed on consumer devices and the these devices have no way to automatically update or refresh a certificate, prior to the 'renewal' processes which take place a few days before the natural expiry of the certificate.

This is a good opportunity to be asking: What can be done to address this, so that it doesn't happen again in the future?

That is a fair question.
The certificates that we provide to this customer are from a contract negotiated in 2015.
When customers with similar use cases approach us today we steer them towards a private CA, i.e. a closed private certificate ecosystem, that would not relate in any way to the Web PKI.

That's encouraging, but only slightly so. I'm not sure whether to read "we steer them towards" as a "We wouldn't do it" or a "We'd suggest they do this, but possibly negotiate otherwise"

This might seem like haggling, but I going back to the original issue, I think the challenge here is understanding what Sectigo is doing, through policy or practice, to make it easier to replace/renew certificates. Sometimes, this might be contractual policy (e.g. the proposed changes to the Subscriber Agreement floated in the CABF), this might be via technical policy (e.g. issuance using certain API endpoints/implementations), it might be public policy (e.g. documentation efforts).

For all of this, it's understanding "How can we be sure there isn't a delay next time", and a big part of that is understanding what's been done to address the challenges that have been identified this time.

Robin:

The factors here that prevented practical revocation of these certificates were
a) The BRs require us to revoke where certificates are not issued in accordance with these Baseline Requirements.

Could you elaborate on this? I fail to see how a requirement to revoke is a factor that prevents you from revoking certificates.

b) We firmly believe that it is solely due to a misdrafting of 7.1.4.2.2.i that the use of subject:OrganizationalUnitName attributes are inadvertently effectively forbidden by requiring the presence of all three of subject:organizationName, subject:givenName, and subject:surname to also be present before a subject:OrganizationalUnitName may be included. This combination of subject attributes does not arise in a validly issued Server Certificate.

I agree that somewhere in time the changes to 7.1.4.2.2(i) have introduced an unexpected conflict, but that does not allow CAs to write their own rules. I will elaborate below.

c) Since this small section of the requirements is incomprehensible in the context of the current (and historic) web PKI we cannot evaluate whether the issued certificates are in accordance with the incomprehensible and we therefore do not feel bound to revoke in this instance. We regard these as truly exceptional circumstances requiring amendment of the BRs.

Historically, starting from BR v1.0.5 with the adoption of ballot 88 [0], subject:organizationalUnitName similarly required only subject:organizationName, subject:localityName, and subject:countryName. This was updated in BR v1.4.1 with Ballot 175 [1] to the current requirements of also requiring subject:givenName and subject:surname.

I believe that the BR from 1.0.5 through 1.4.0 are perfectly clear in that regard: a meaningful OU requires also a meaningful and validated O, L and C. When checking the ballot, it does indeed look like the change for OU had serious unintended side-effects, and might have erronously altered section 7.1.4.2.2(i), therefore effectively blocking a meaningful OU from being allowed in certificates.


Your argument seems to be "A section of the requirements has been updated and is now incomprehensible for us, therefore, we do not follow this section". I would expect that at least the previous version would be followed, which was likely more comprehensible as that was written with ballot 88 [0], and was endorsed by then-Comodo employees.

Even in the previous wording from before Ballot 175 and the "incomprehensible requirements", this practise that resulted in this non-revocation was not allowed.


We will revisit this report and further evaluate our response.

Thank you, I'm awaiting this re-evaluation of the report & response.

[0] effective 2012-09-12, https://cabforum.org/2012/09/12/ballot-88-br_9_2_4_errata-iso3166/, OU field restrictions then at BR section 9.2.6
[1] effective 2016-09-07, https://cabforum.org/2016/09/07/ballot-175-addition-given-name-surname/, OU at sect. 7.1.4.2.2(i)

(In reply to Ryan Sleevi from comment #6)

However, around 3 million for the same customer are deployed on consumer devices and the these devices have no way to automatically update or refresh a certificate, prior to the 'renewal' processes which take place a few days before the natural expiry of the certificate.

This is a good opportunity to be asking: What can be done to address this, so that it doesn't happen again in the future?

That is a fair question.
The certificates that we provide to this customer are from a contract negotiated in 2015.
When customers with similar use cases approach us today we steer them towards a private CA, i.e. a closed private certificate ecosystem, that would not relate in any way to the Web PKI.

That's encouraging, but only slightly so. I'm not sure whether to read "we steer them towards" as a "We wouldn't do it" or a "We'd suggest they do this, but possibly negotiate otherwise"

This might seem like haggling, but I going back to the original issue, I think the challenge here is understanding what Sectigo is doing, through policy or practice, to make it easier to replace/renew certificates. Sometimes, this might be contractual policy (e.g. the proposed changes to the Subscriber Agreement floated in the CABF), this might be via technical policy (e.g. issuance using certain API endpoints/implementations), it might be public policy (e.g. documentation efforts).

Firstly, apologies for the 'mis-steer', but I have some more information and a private CA would not have worked for this customer.
These are devices that are accessible over the internet via home and office broadband connections. I believe a similar example from elsewhere is the Plex media servers that use certificates.

There are multiple pieces to this.

System design

Generalizing, if we're in the loop as the customer designs their product, as we seldom are, then we have a real prospect of steering their design along a prudent path given our experience. Conversely, if we are approached by someone who tells us little about their application but wants to enter into a contract to take TLS certificates from us under our standard policy it would be true to say that we do not usually turn them away.
We actively promote the use of private CAs to our customers when they are practical for the customer's use case because of the negative consequences of interactions between (what should be) a closed ecosystem and the web PKI that we have often observed and sometimes experienced.

Technical ability to revoke

We have sole control of the CA from which these end entity certificates are issued. We can revoke all of the affected leaf certificates.

Legal ability to revoke

In general section 3.4 of our Subscriber Agreement, and in particular our contract with this customer, allow us to revoke certificates where we are required to do so by policy. Although the explicitly included revocations have changed over time, the final subsection has been carried forward for a long time and reads "[Sectigo may revoke a Certificate if] the Certificate, if not revoked, will compromise the trust status of Sectigo." It therefore does not give us any contractual or legal issues to revoke these leaf certificates.

Ability to support auto renewal/replacement of the device certificate

These devices trigger the application for their certificates through a centralized service that uses our APIs. They trigger the request of the first certificate as they are initially powered on, and they trigger the request of the follow-on certificates as the initial certificate expires.
In this case the missing piece is the ability for anyone (Sectigo, our customer, or the owner of the desktop device) to trigger this renewal process by any event other than expiry.

Effect on end users

Because this customer is not able to force early renewal of the certificates on the desktop devices sitting in millions of homes and offices, I am led to believe that the affect of us revoking these certificates will be to either brick these devices or put them in a state requiring a customer service call to restore full functionality.

Shorter lifetime certificates

In future we would require the use of shorter lifetime certificates. We already offer, and this customer uses, APIs through which certificates can be automatically requested and the lifecycle managed by the subscriber, meaning that long duration certificates are not needed since renewal / replacement is a hands-free operation. Some of our highest volume deployments are of 90-day certificates and we are on record as supporting moves toward shorter certificate lifetimes in general. We welcome the changes Apple made in this regard.

We regret that the customer does not have the ability to pull a lever to force early renewal but we have provided the levers and the customer's inability to pull them in this case is not within our control.

With the benefit of hindsight, understanding how this customer ended up using these certificates and seeing now their inability to handle a revocation event, it seems that we must call out to customers not only that their certificates may be revoked as a result of events that they cannot predict or perhaps even understand, but that they need actively to plan for the necessary elements of their system (and in this case that would mean the software on the desktop devices) to cope with an unexpected certificate revocation, probably by triggering a new certificate request and installation, whether that trigger comes from the device, from our customer, or from Sectigo.
Putting that in a more active voice, we would look to see automation of reissuance that can be triggered at any point - simply to be able to deal with situations like this or Heartbleed-type exploits. If this isn't possible then shorter lifetime certificates will be encouraged, even looking toward 30-day lifetimes.

(In reply to Matthias from comment #7)

Robin:

The factors here that prevented practical revocation of these certificates were
a) The BRs require us to revoke where certificates are not issued in accordance with these Baseline Requirements.

Could you elaborate on this? I fail to see how a requirement to revoke is a factor that prevents you from revoking certificates.

It is not. My preliminary text should have identified it as a discussion of the factors that we believed prevented revocation rather than a list of factors.

b) We firmly believe that it is solely due to a misdrafting of 7.1.4.2.2.i that the use of subject:OrganizationalUnitName attributes are inadvertently effectively forbidden by requiring the presence of all three of subject:organizationName, subject:givenName, and subject:surname to also be present before a subject:OrganizationalUnitName may be included. This combination of subject attributes does not arise in a validly issued Server Certificate.

I agree that somewhere in time the changes to 7.1.4.2.2(i) have introduced an unexpected conflict, but that does not allow CAs to write their own rules. I will elaborate below.

c) Since this small section of the requirements is incomprehensible in the context of the current (and historic) web PKI we cannot evaluate whether the issued certificates are in accordance with the incomprehensible and we therefore do not feel bound to revoke in this instance. We regard these as truly exceptional circumstances requiring amendment of the BRs.

Historically, starting from BR v1.0.5 with the adoption of ballot 88 [0], subject:organizationalUnitName similarly required only subject:organizationName, subject:localityName, and subject:countryName. This was updated in BR v1.4.1 with Ballot 175 [1] to the current requirements of also requiring subject:givenName and subject:surname.

I believe that the BR from 1.0.5 through 1.4.0 are perfectly clear in that regard: a meaningful OU requires also a meaningful and validated O, L and C. When checking the ballot, it does indeed look like the change for OU had serious unintended side-effects, and might have erronously altered section 7.1.4.2.2(i), therefore effectively blocking a meaningful OU from being allowed in certificates.


Your argument seems to be "A section of the requirements has been updated and is now incomprehensible for us, therefore, we do not follow this section". I would expect that at least the previous version would be followed, which was likely more comprehensible as that was written with ballot 88 [0], and was endorsed by then-Comodo employees.

Even in the previous wording from before Ballot 175 and the "incomprehensible requirements", this practise that resulted in this non-revocation was not allowed.

There are two discontinuities here.

The first is that we now accept that our original interpretation of 7.1.4.2.2(i) as supporting the certificate warranties in 9.6.1 subs 3 and 4 was not its intent.
Even in the BRs from 1.0.5 through 1.4.0, before givenName and surname were added to the mix, the wording of 7.1.4.2.2(i) was horrible. It's meaning may be discerned, but I accept that if we found it hard to understand we should have queried it.

The second is the addition of the requirements for givenName and surname. I suggest these 'broke' 7.1.4.2.2(i).

In the case that the wording is broken, should we follow the letter or the spirit of the BRs?
If we follow the letter, we may not include in an OU anything that refers to the subject or any other entity because we do not include givenName.
If we follow the spirit, for EV and OV we may include in an OU anything that refers to the subject entity.
In that sense, I think most CAs follow the spirit.

For DV, we now omit OU altogether.

You suggest that the modification from BR 1.4.1 to 7.1.4.2.2(i) to introduce the givenName and surname should be disregarded. Since that matches my 'spirit' of the BRs, I agree.

When it comes to revocation I am trying to acknowledge that our issuance of DV certificates with OUs that included Sectigo's trademarks or the name of a reseller was against the intent of the BRs.
It wasn't malicious and I do not think it was misleading, but those factors do not come into play.
It was a compliance failure against the intent of the BRs.

There are also two issues over revocation.
Firstly, which certificates must we revoke? Do we revoke the certificates that went against the letter of 7.1.4.2.2(i) or only those that went against the spirit? So far we have made that evaluation of numbers against the 'spirit', e.g. we have not included EV certificates that have an OU but don't have a givenName. We are content to discard the modification from BR 1.4.1 to 7.1.4.2.2(i) that introduced the givenName and surname requirement, but is that our decision to make?
Secondly, we have the issue previously mentioned about the consumer devices protected by many of these certificates.

We continue to work with our partners to get lists for bulk revocation and I will keep this bug updated with our progress.

We will propose a ballot in the CA/B forum to restate 7.1.4.2.2.i in a form of words whose meaning is clear and which permits some sensible compliant use of the OU attribute.

Robin,

Thanks for clarifying your point, and elaborating on your earlier statements.

You suggest that the modification from BR 1.4.1 to 7.1.4.2.2(i) to introduce the givenName and surname should be disregarded.

My point is that even when those changes are disregarded, the certificate profiles for most of the certificates would still be in violation of the BR: e.g. "OU=Essential SSL,OU=Domain Control Validated" in the certificate profile dates back to at least 2012 and have thus been in violation probably since at least the previous requirements regarding OU usage, so "in the context of the current (and historic) web PKI" ticked me off a bit.

Regarding "disregarded": this is incorrect, I do not suggest some section of the BR should be disregarded (= ignored), but that incomprehensible sections should be read carefully, and when not understood, community feedback can (and should) be requested. If sections of the BR do not carry the spirit of the BR, then a CA/B Forum Ballot can be made with feedback. In the time between discovery of the faulty section of the BR and the enactment of the ballot with the changes, the root program (and the community) can provide input on what to do. Do not though, that both the BR AND the CA may be at fault at the same time, and thus may require action on both sides.

Basically, I am for updating the BR to revert 7.1.4.2.2(i) to the version of BR 1.4.0, as I do agree with that being more to the spirit of 7.1.4.2.2(i), but that is not my call to make as a certificate consumer.

When it comes to revocation I am trying to acknowledge that our issuance of DV certificates with OUs that included Sectigo's trademarks or the name of a reseller was against the intent of the BRs.
It wasn't malicious and I do not think it was misleading, but those factors do not come into play.
It was a compliance failure against the intent of the BRs.

Thank you for this, it does clear clear up your previous regettably unclear comment - "When the requirements are comprehensible we will follow them." implies that incomprehensible requirements are not followed, and that is a dangerous precedent.

Firstly, which certificates must we revoke? Do we revoke the certificates that went against the letter of 7.1.4.2.2(i) or only those that went against the spirit?
So far we have made that evaluation of numbers against the 'spirit', e.g. we have not included EV certificates that have an OU but don't have a givenName.
We are content to discard the modification from BR 1.4.1 to 7.1.4.2.2(i) that introduced the givenName and surname requirement, but is that our decision to make?

This is also not my decision to make, but I believe the relevant m.d.s.p thread [0] had a resulting consensus that the OU is identifying part of the subject, thus must be validated against the subject information. Seeing that even OV/EV-certificates included OU fields with brands owned by Sectigo[1], I would suggest also adding those certificates to the revocation list as they include meaningful identifiers in the OU that are not identifying the subject [and are against both the BR and the spirit of the BR as understood by the m.d.s.p].

We continue to work with our partners to get lists for bulk revocation and I will keep this bug updated with our progress.

In Bug 1593776, you mentioned that 11,170,043 certificates are affected, and above you mentioned numbers regarding expiration by end June, and revoking replaced certificates.

Am I correct in assuming you are only revoking certificates that are being replaced, and not revoking other leaf certificates that have this problem?

[0] https://groups.google.com/forum/#!msg/mozilla.dev.security.policy/haYidcAYq8I/sVdgoyh-DAAJ
[1] The historical Sectigo and Comodo CPS up to and including 5.1.5, https://sectigo.com/legal

... continuing on the last comment, I accidentally hit save before fully completing my comment

Am I correct in assuming you are only revoking certificates that are being replaced, and not revoking other leaf certificates that have this problem?

If this assumption is not correct: what is the planning for the revocation of the remaining 5.5m certificates?

Nick: Can you provide an update here? I'm fairly sure nearly two months is longer than the one week for responding to incident reports.

Flags: needinfo?(Robin.Alden) → needinfo?(nick)
Blocks: 1563579

Ryan: Yes - an update will be made in the next couple of days. I have a meeting at the end of this week (Friday, late UK time) with one subscriber with the bulk of the affected certificates.

Flags: needinfo?(nick)

The meeting with one subscriber was had late today, not Friday, with some good movement on revoking some significant volume of certificates with that single partner.

I also have data on the current remaining certificate numbers and month-by-month expiry data.

I will post tomorrow after internal review.

Currently there are approximately 4.8mm certificates that are still valid.

The valid certificates are spread across over 13,000 customers and partners.

Approximately 100,000 certificates have been revoked so far in both 'bulk' actions and natural revocations. We are still working to revoke more where possible.

Many have expired naturally - month-over-month expiry volumes:

Jun-20 588,166
Jul-20 494,025
Aug-20 464,806
Sep-20 462,423
Oct-20 500,347
Nov-20 523,072
Dec-20 434,132

We have also worked with some of the larger-volume, direct partners who are in control of the subscriber certs and keys to do bulk reissues and revocations.
Many partners are are resellers who in turn may have their own resellers.
We continue to work with those partners to action revocations.

Around 3mm of those 4.8mm certificates are with a single partner. The certificates themselves are installed on consumer devices (hardware storage appliances) that are used in the homes of consumers, and the certificates are there to allow access to the users data over a secure channel. Clients consist of browsers as well as desktop and mobile applications.

We are working with the partner to determine what volume of certificates can be revoked, but discussions suggest around 20-25% (approximately 700,000) of the issued certificates could be revoked once a list is generated of devices which no longer have a certificate associated with them. Unfortunately this check can't be done passively or by the CA, so we are reliant on the partner to make this determination.

While we can revoke certificates for devices where users are not going to be impacted, we believe that revoking the certificates of active devices would have a significant impact on end users. The only available option to them once a certificate is revoked would be a factory reset, resulting in erasure of all stored data and settings, and the device restarts as if 'brand new' - which will then obtain a new certificate (without the OU).
Of course this is only in the case that the user is in fact able to perform the reset.
The devices do obtain a new certificate before the expiry of their current certificate which will result in a certificate with no OUs. However this process cannot be forced or initiated until a short time before the expiry of the existing certificate.

In addition:

  • Partner has stated that this infrastructure has been EOL'd with development of a replacement underway.
  • The replacement infrastructure does not rely on public certificates like the current infrastructure does.
  • The deployment of the new infrastructure and software will not begin for many months, and rollout to all devices could take over 18 months after that.
  • Existing devices will still be supported in getting new certificates when they are setup or factory-reset. These certificates are:
    • Provisioned after the mid-December date when Sectigo disabled OUs in DV certificates
    • Going to be issued as 1 year certificates from mid-August 2020. (The partnership with Sectigo (then Comodo) was established in early 2015, when 4 and 5 year certificates were available. Two-year certificates were used as the default - not for any specific reason that we are able to recall.)

Were we to repeat a similar integration (and there are a number of similar discussions happening), Sectigo will make two main recommendations to be followed if possible:

  • To use the shortest certificate term possible. This is now capped at a 1 year maximum anyway, but 90 days or less would be advised where practical for the use-case.
  • Ensure the system has mechanisms to replace the certificate outside of waiting for expiration, and in a way that is not disruptive to user experience but allows for mass-replacement of certificates in a short timeframe.

Once we confirm the number of certificates from this partner for revocation and a date for the action, I will update this ticket - or provide an update within 7 days, whichever is sooner.

Additionally: Watching the open issues from other CAs with respect to recent large-scale revocations, I/we do have some other thoughts and comments (and questions!) with regard to revocation and automation. Some of the things we do and are doing to help, and challenges we face and areas we must investigate deeper into.
They aren't necessarily directly related to this bug - would it perhaps be more appropriate for an m.d.s.p discussion?

(In reply to Nick France from comment #16)

Around 3mm of those 4.8mm certificates are with a single partner. The certificates themselves are installed on consumer devices (hardware storage appliances) that are used in the homes of consumers, and the certificates are there to allow access to the users data over a secure channel. Clients consist of browsers as well as desktop and mobile applications.

We are working with the partner to determine what volume of certificates can be revoked, but discussions suggest around 20-25% (approximately 700,000) of the issued certificates could be revoked once a list is generated of devices which no longer have a certificate associated with them. Unfortunately this check can't be done passively or by the CA, so we are reliant on the partner to make this determination.

While we can revoke certificates for devices where users are not going to be impacted, we believe that revoking the certificates of active devices would have a significant impact on end users. The only available option to them once a certificate is revoked would be a factory reset, resulting in erasure of all stored data and settings, and the device restarts as if 'brand new' - which will then obtain a new certificate (without the OU).
Of course this is only in the case that the user is in fact able to perform the reset.
The devices do obtain a new certificate before the expiry of their current certificate which will result in a certificate with no OUs. However this process cannot be forced or initiated until a short time before the expiry of the existing certificate.

How are these devices updated? Is this something being fixed?

I'm just trying to understand how such a significant partnership would happen, given the widely-understood risk of a lack of CA flexibility can have. That is, put differently, if these systems were designed to support the Web, they naturally would have incorporated CA flexibility to handle things like the retiring of CAs and/or business continuity events, and that would have included the ability to replace certificates "out of band".

If you're telling me these 3 million devices have roughly no way to update, and no way to alternate CAs, my concern is: why the heck are they using a publicly trusted PKI in the first place? I understand this is ultimately a question to the vendor (which I'm guessing may be Western Digital, based on the volume described), but I'm trying to understand the decision making process for Sectigo to enter into a relationship that would jeopardize its BR compliance so significantly, and what can be done to correct this?

I'm not sure the rest is really reassuring. You'll "recommend" it, but you're required to contractually and legally enforcably ensure you can do so. The point of this requirement in the BRs is for CAs to take the lead in taking the necessary steps to ensure this. At best, I can't help but feel this response is trying to make it "somebody else's problem" to expect that Sectigo will comply with the BRs. I can understand not wanting to cause disruption, and that's not the goal, but I can't help but feel like the response is "We'll comply when it's convenient, but we can't help how folks use our certificates".

The whole point of the BRs and Root Programs is for CAs to make binding committments, to browsers/relying parties, "we commit to do these actions, and will design our business to make sure they happen". I would hope that something more meaningful than "Well, we'll recommend they understand we have to follow the BRs, except when they do something so bad we decide not to".

For example, a more substantive response is "We're highlighting to every one of our sales team employees to emphasize that certificates MUST be revoked within X days, and that this can happen at any time. We're taking these steps to make sure our customers have options to replace them. We're taking these steps to ensure our customers know. We're committing that, going forward, we won't make exceptions, because we told our users what to expect" or something... better?

It's no different than if we were to discover a domain validation bug that prevented proper validation from happening. We'd see wholesale cert revocation then, for good reason. Or, similarly, heartbleed. Or, more practically, and why I emphasize this so much: post-quantum cryptography. We need our systems now to be robust to updates and replacements, and to be identifying the challenges that prevent this. Sectigo, as with other CAs, are supposed to be the ones taking the difficult choices now so we can minimize long-term impact. I'm not seeing that yet.

Flags: needinfo?(Robin.Alden)
QA Contact: wthayer → bwilson

Still working on a longer response to Ryan's comments in #c17 - those will be posted shortly.

Is shortly a period less than or greater than two weeks?

Flags: needinfo?(Robin.Alden) → needinfo?(nick)

Our understanding is that the existing devices do not have a software update mechanism that can be used to force a software update remotely. As mentioned in the original explanation, a firmware update which the customer manually initiates is the only option. This is manual and requires restoration of the device and data on it. As you suggest, these are indeed devices from Western Digital.
The devices themselves do not have a way to remotely force an update or to fetch a certificate from an 'alternate' CA, although of course the vendor could integrate certificates from an another provider (though I do not believe they intend to do so given that we were told the platform has been deprecated).

The original requirement for web PKI certificates was to support HTTPS and TLS enabled access to the interfaces of the device without certificate warning messages as would occur with self-signed or privately-signed certificates.
We did not believe back then that the relationship with this partner could jeopardise BR compliance, but the impact of the lack of flexibility in this solution has influenced current and future discussions with customers regarding similar integrations.

We are still waiting to confirm the final list of revocations with the partner. We have also asked about the possibility to migrate (for the devices they are still providing certificates to) to shorter-lifetime certificates - shorter than 365 days. We may be able to do this without them requiring changes to their existing infrastructure. Of course, the change would need to be coordinated with them, and that process is not happening rapidly right now.

We do agree that there is an overwhelming need to ensure that any and all certificates can be replaced rapidly, easily and minimising impact to the subscriber. We're all aware there is no silver-bullet solution, but shorter certificate lifetimes are the best start.

Indeed we at Sectigo are already driving toward shorter certificate durations to minimise the impact when certificates must be revoked. We were one of a small number of CAs with a 'yes' vote on SC22.
Currently in 2020, over 90% of publicly-trusted server certificates issued by Sectigo are issued for 90 days (quick analysis of the CT log data was at around 92% for 90-100 day durations). 5% are 1-year and 3% are 2 year certificates.

Automation is also critical - we have offered and advocated for automation for a very long time. Many of our customers wouldn't be customers had we not been able to offer an automated solution. Almost all new reseller partners are onboarded 'API first' - their accounts enabled for automated certificate provisioning and API guides and documentation provided from day one discussions with them.
Enterprises too are given a platform that offers our own APIs, as well as industry standards such as ACME, SCEP and EST.
Of course our 'influence' as a CA cannot always be driven into every part of the infrastructure of a large organisation, nor can we push it to the top of agendas.

One part of the work that remains to be done is to better understand the segment of certificate subscribers who still choose to request certificates manually. I'm sure this is prevalent throughout most of the CAs. Given the availability of automated options - ACME, Cloudflare, cPanel, Wordpress, almost any modern hosting platform - these users choose the most circuitous path to using TLS, and frankly we don't fully understand why.
Education will play a part, and our marketing: blogs, webinars, sales and pre-sales material all push on the need for automation and the ways Sectigo support it. However there is still a knowledge gap here in understanding why and how there are so many end-user subscribers who seemingly do not use the automation that is available.

I know Jeremy has mentioned in [1651828] that DigiCert have been able to identify which certificates are 'issued using automation compared to those that aren’t.' We're not able to do that same analysis. The majority of our partners obtaining certificates in any volume are automated, but that does not mean that the actual subscriber used automation in the provisioning of the certificate. Deeper analysis is needed, much like the above mentioned research into subscribers who choose to provision certificates manually. This is made somewhat harder without a direct line of communication between us and the certificate user, but it's something we are investigating nonetheless.

To dig deeper on the issue, Tim (Callan) has already begun to arrange a series of internal workshops during September at Sectigo, involving many staff across all disciplines. The overall topic is 'certificate agility' - and what we can do as a CA to better improve our users' (not just customers') ability to change certificates with little to no notice.
The intent is to cover everything from better understanding the users (as above) to marketing, education, sales techniques, technical developments, tools and implementations, partnerships, contractual and legal avenues to look at, and then our own compliance changes in terms of preventing these issues and rotation of CAs and so on. We're not just limiting the scope to the WebPKI either, as many lessons learned could be applied to ecosystems beyond this one.
Practical outcomes of these sessions will be reflected in our products and offerings and added to any Bugzilla tickets or m.d.s.p. discussions as required.

Flags: needinfo?(nick)

We have no further update.

We have no further update.

We have no further update.

We have no further update.

We have no further update.

Assignee: Robin.Alden → nick

We have no further update.

We have no update at this time.

We have some updates from the customer, and we are currently processing the information from them in order to identify the list of affected, valid certificates that we can revoke.
We will update with numbers when we have them, as well as a date for those to be revoked.

On Friday 30th October we processed 737,637 revocations for certs issued under the wd2go.com domain.

We are also reviewing to see if there are further revocations we can make.

Additionally, we are working with WDC to migrate to 90-day certificate duration (over the 365 day currently), and we will update once we have progress there.

We have no more updates at present.

Whiteboard: [ca-compliance] [delayed-revocation-leaf] → [ca-compliance] [delayed-revocation-leaf] Next update 2020-12-01
Flags: needinfo?(nick)
Flags: needinfo?(tim.callan)

We have no update at this time.

Flags: needinfo?(tim.callan)
Whiteboard: [ca-compliance] [delayed-revocation-leaf] Next update 2020-12-01 → [ca-compliance] [delayed-revocation-leaf] Next update 2021-01-05

We have no update at this time.

We have no update at this time.

Flags: needinfo?(nick) → needinfo?(bwilson)

At this point, I'm inclined to close this matter as having had adequate discussion - with the understanding that Sectigo will continue to be diligent with its customers and revoke misissued certificates on a timely basis. I plan to close it on or about next Wednesday 27-Jan-2021.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] [delayed-revocation-leaf] Next update 2021-01-05 → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.