Closed Bug 1639804 Opened 4 years ago Closed 4 years ago

Sectigo: Failure to revoke key-compromised certificate within 24 hours

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mpalmer, Assigned: Robin.Alden)

References

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Steps to reproduce:

At 2020-05-13 21:50:05 UTC, a certificate problem report was delivered to sectigo-com.mail.protection.outlook.com (104.47.46.36) on behalf of sslabuse@sectigo.com, stating that a private key with SPKI 2ed30b85a9ee8099e20a99e295d9a980f256a99511229715778a69617775ed2b had been compromised, and requesting revocation of all certificates issued by Sectigo using that SPKI be revoked. The URL of a CSR attesting to the compromise of the private key, signed by the compromised private key, was provided.

Actual results:

Revocation was effected at 2020-05-15 00:33:00 UTC, based on the timestamp contained within signed OCSP responses for certificates using the specified SPKI.

Expected results:

The certificate to have been revoked within 24 hours of the certificate problem report being sent.

Assignee: bwilson → Robin.Alden
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]
Blocks: 1563579

I apologize for the slow response on this bug. I acknowledge the report and will follow up with a problem report in the usual format.

Flags: needinfo?(Robin.Alden)
Whiteboard: [ca-compliance] → [ca-compliance] Next Update 15-July 2020
  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

An issue was reported on Saturday, May 13, 2020 5:50 PM ET on our abuse email account indicating that one or more certificates issued by our CA is using a key which has been compromised due to public disclosure.

Also indicating that a list of known certificates using this key can be retrieved from:
https://crt.sh/?spkisha256=2ed30b85a9ee8099e20a99e295d9a980f256a99511229715778a69617775ed2b
as Matt indicates in the bug.

Note: Times are in ET (Eastern Time) due to the use of our internal tools.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Email received: Saturday, May 13, 2020 5:50 PM ET
Reply sent to reporter: Sunday May 14, 1:01 PM ET acknowledging and assigning an internal case number in our systems
Internal revoke request sent Sunday May 14, 1:12 PM ET indicating the order to do this related to the case and the reason and assigning a time and date to be revoked, same day at 8:30 PM ET
Cert revoked: Sunday May 14, 8:33 PM ET (As Matt indicates at 2020-05-15 00:33:00 UTC)

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

  2. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
    One certificate was affected that could not be revoked within the 24 hours deadline

  3. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

This is the certificate affected:
https://crt.sh/?id=2802219672
and this is with the OCSP response
https://crt.sh/?id=2802219672&opt=ocsp

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

Checking the certificate problem reporting queue at weekends was an issue in May.
We were struggling due to some external issues that affected our capacity to check out the abuse queue within the appropiate timeline.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

a) End of May we added one more person to check the queue and since then we have been much better catching the reports during non-working hours.
There is no good way to catch these specific reports because it is always a human that need to check and find, so we´re training and assigning additional people to help check and manage for abuse reports and have some backups.
b) This final statement does not relate to the particular compromise reports here, but we would be grateful to receive future reports of compromised keys through the automated mechanisms we have provided for this. On 22nd of May (about a week after this incident) we also updated our CPS, available under https://sectigo.com/legal , that includes the details of these, but for convenience I include a snippet from that document here.
• Sectigo also operates an automated revocation portal at https://secure.sectigo.com/products/RevocationPortal
where Subscribers/Domain owners may revoke their certificates, or the public may report and revoke certificates for which the private key has been compromised.
• The public may also report and revoke certificates for which the private key has been compromised by using the ACME revokeCert method at this endpoint:
ACME Directory: https://acme.sectigo.com/v2/keyCompromise
revokeCert API: https://acme.sectigo.com/v2/keyCompromise/revokeCert

Checking the certificate problem reporting queue at weekends was an issue in May.

This is a problematic situation for Sectigo to have been in, given CAs are required to provide a 24x7 problem report processing service. Can Sectigo provide more details as to how this situation came about, and how Sectigo intends to ensure such a situation cannot come to pass again in the future?

We were struggling due to some external issues that affected our capacity to check out the abuse queue within the appropiate timeline.

Was Sectigo aware of these issues prior to this incident report being filed? If so, what steps, if any, had Sectigo taken to mitigate these issues?

we would be grateful to receive future reports of compromised keys through the automated mechanisms we have provided for this.

Whilst I obviously can't speak for all reporters of key compromise, for the reports I provide via the Revokinator, all I can say is "don't hold your breath". The Revokinator does not currently support key compromise notifications via ACME (because at the time it was written, no CA supported BR-compliant revocation via ACME), and all my free time for the foreseeable future will be consumed developing a reliable replacement for the parts of crt.sh that the Revokinator relied on to match compromised keys to certificates and their issuers.

(In reply to mpalmer from comment #3)

Checking the certificate problem reporting queue at weekends was an issue in May.

This is a problematic situation for Sectigo to have been in, given CAs are required to provide a 24x7 problem report processing service. Can Sectigo provide more details as to how this situation came about, and how Sectigo intends to ensure such a situation cannot come to pass again in the future?

We had staff out of position due to COVID and that reduced our capacity.
As Iñigo mentioned, at the end of May we added another staff member to the team that responds to problem reports.

We were struggling due to some external issues that affected our capacity to check out the abuse queue within the appropiate timeline.

Was Sectigo aware of these issues prior to this incident report being filed? If so, what steps, if any, had Sectigo taken to mitigate these issues?

We were aware that we had staff not able to contribute to the handling of problem reports and we moved team members around to backfill the shortfall. We were not aware that this was going to be the cause of a failure to meet policy.

we would be grateful to receive future reports of compromised keys through the automated mechanisms we have provided for this.

Whilst I obviously can't speak for all reporters of key compromise, for the reports I provide via the Revokinator, all I can say is "don't hold your breath". The Revokinator does not currently support key compromise notifications via ACME (because at the time it was written, no CA supported BR-compliant revocation via ACME), and all my free time for the foreseeable future will be consumed developing a reliable replacement for the parts of crt.sh that the Revokinator relied on to match compromised keys to certificates and their issuers.

Thanks for the update. If you have problems to report with crt.sh I am sure Rob would welcome them at https://crt.sh/forum.

This incident occurred before we changed our CPS to indicate that we were unable to respond to emailed key compromised reports in a timely manner.

Flags: needinfo?(Robin.Alden)

We were not aware that this was going to be the cause of a failure to meet policy.

Well yes, because that would require the ability to see the future. However, presumably Sectigo made estimates of the required workload, the effort required to process that required workload, and decided whether the risk of possibly not meeting policy was acceptable. Were there any external circumstances, such as an unforeseeable "spike" in problem reports, that invalidated Sectigo's risk analysis, or is this just a case of Sectigo making a gamble that, this time around, didn't pay off?

We had staff out of position due to COVID and that reduced our capacity.
[...]
We were aware that we had staff not able to contribute to the handling of problem reports

If Sectigo did not have the capacity to process problem reports for the certificates they issued, did Sectigo consider reducing issuance volume until sufficient problem report processing capacity was available?

If you have problems to report with crt.sh I am sure Rob would welcome them

Rob is already aware. His response did not inspire sufficient confidence for me to be able to continue to rely on crt.sh.

Matt, I presume you're referring to this Twitter thread:
https://twitter.com/pwnedkeys/status/1278302601226801152

We operate crt.sh to the best of our ability. If you've decided to not rely on crt.sh because it doesn't have infinite capacity, then that's fine. There are use cases to which crt.sh is suited, and there are use cases to which crt.sh is not suited.

However, you appear to be using your decision (to stop using crt.sh) to justify why you think Sectigo's updated Problem Reporting Mechanism (as described in our latest CPS) is unreasonable. This makes no sense to me.

BTW, I was quite offended by your claim in that tweet that "if you use it (crt.sh) to make them look bad, they'll block you from using it". Whilst it's entirely reasonable and sensible to "constantly view CAs through the lens of distrusting them" (quoting an m.d.s.p post from Sleevi yesterday), I think you have gone too far when you present as fact - without any supporting evidence - any claim that a CA is acting dishonestly.

However, you appear to be using your decision (to stop using crt.sh) to justify why you think Sectigo's updated Problem Reporting Mechanism (as described in our latest CPS) is unreasonable. This makes no sense to me.

It makes no sense to me that you interpreted my statements that way -- I can't see anything that relates my opinions of Sectigo's updated CPS to crt.sh (or a lack thereof).

Assuming that by "updated Problem Reporting Mechanism" you mean the ACME revokeCert endpoint, I think the fact that I wrote a standalone ACME revocation server should dispel any suggestion that I am somehow against using ACME for key compromise revocation. I'm all for automation, and I think it's wonderfully reasonable.

My statement "don't hold your breath", with regards to using Sectigo's ACME revocation endpoint, was made in the context of a timeline for having the Revokinator send revocation requests to Sectigo's ACME service. Since I can't use crt.sh any more, I have to prioritise my Copious Free Time towards building something that'll replace what I was using crt.sh for, so that I can resume removing certificates with compromised keys from circulation. The need to do that work, ahead of the many other things going on in my life, will significantly delay the ACME support that you and other CAs would prefer to see happen. Hence, "don't hold your breath" for the Revokinator using ACME, because you'll go very blue in the face before I have a hope of getting to it.

On the other hand, if you're referring to "we changed our CPS to indicate that we were unable to respond to emailed key compromised reports in a timely manner", then yes, I believe that to be unreasonable, but again I cannot see how that relates to crt.sh. I don't see any carve-out in the BRs that allows revocation to be delayed for an arbitrary period of time just because the problem report arrived via e-mail.

I was quite offended by your claim in that tweet that "if you use it (crt.sh) to make them look bad, they'll block you from using it".

If my conclusion was erroneous, I apologise. Based on the data I had available (no recent change in the Revokinator's usage pattern or overall query rates, a lot of unanswered Sectigo incident reports, a straight-up ECONNREFUSED block, no attempt at communication before or after dropping the banhammer, no documented rate limits for crt.sh queries), it was the most logical conclusion to draw at the time. Again, if that conclusion was incorrect, you have my apologies.

(In reply to mpalmer from comment #8)

However, you appear to be using your decision (to stop using crt.sh) to justify why you think Sectigo's updated Problem Reporting Mechanism (as described in our latest CPS) is unreasonable. This makes no sense to me.

It makes no sense to me that you interpreted my statements that way -- I can't see anything that relates my opinions of Sectigo's updated CPS to crt.sh (or a lack thereof).

In that case, I apologize for getting the wrong end of that particular stick.

Assuming that by "updated Problem Reporting Mechanism" you mean the ACME revokeCert endpoint, I think the fact that I wrote a standalone ACME revocation server should dispel any suggestion that I am somehow against using ACME for key compromise revocation. I'm all for automation, and I think it's wonderfully reasonable.

Excellent. Thanks for confirming that you still hold that view. (I had been wondering!)

My statement "don't hold your breath", with regards to using Sectigo's ACME revocation endpoint, was made in the context of a timeline for having the Revokinator send revocation requests to Sectigo's ACME service. Since I can't use crt.sh any more,

I'm going to read that as s/can't/won't/, at least for now. And let me repeat my offer (in my reply to your tweet) to liaise with our Ops team to find out if/why you were blocked and to get you unblocked.

I have to prioritise my Copious Free Time towards building something that'll replace what I was using crt.sh for, so that I can resume removing certificates with compromised keys from circulation. The need to do that work, ahead of the many other things going on in my life, will significantly delay the ACME support that you and other CAs would prefer to see happen. Hence, "don't hold your breath" for the Revokinator using ACME, because you'll go very blue in the face before I have a hope of getting to it.

Out of interest, is the Revokinator open-source? I just looked at https://github.com/pwnedkeys and https://github.com/tobermorytech, but couldn't find it. Is adding ACME support to your Revokinator something that anybody else could usefully help with? (I can appreciate that it may well not be; I know from working on crt.sh that scaling a project beyond 1 person can be hard ;-) ).

On the other hand, if you're referring to "we changed our CPS to indicate that we were unable to respond to emailed key compromised reports in a timely manner", then yes, I believe that to be unreasonable, but again I cannot see how that relates to crt.sh. I don't see any carve-out in the BRs that allows revocation to be delayed for an arbitrary period of time just because the problem report arrived via e-mail.

The BRs (section 4.9.3) mandate that:
"The CA SHALL maintain a continuous 24x7 ability to accept and respond to revocation requests and Certificate
Problem Reports.
The CA SHALL provide Subscribers, Relying Parties, Application Software Suppliers, and other third parties
with clear instructions for reporting suspected Private Key Compromise, Certificate misuse, or other types of
fraud, compromise, misuse, inappropriate conduct, or any other matter related to Certificates. The CA SHALL
publicly disclose the instructions through a readily accessible online means and in section 1.5.2 of their CPS.

Note that there's also no carve-out in the BRs that allows revocation to be delayed for an arbitrary period of time just because the problem report was put "on display at your local planning department in Alpha Centauri" (https://www.goodreads.com/quotes/379100-there-s-no-point-in-acting-surprised-about-it-all-the) ! But both that, and the lack of a carve-out for problem reports received via email, are equally irrelevant in my view.

We have provided "clear instructions for reporting suspected Private Key Compromise..." in our CPS, as you are aware. Those instructions form our Problem Reporting Mechanism. I'm not aware of any rules mandating that a CA MUST accept Problem Reports via email (see https://github.com/mozilla/pkipolicy/issues/98) or that a CA MUST accept all types of Problem Report at the same contact point or via the same technical mechanism(s).

In the "clear instructions for reporting suspected Private Key Compromise" in our CPS, we provide several options. Unsurprisingly, one of those options is not "Print out a hex dump of the private key and post it to your local planning department in Alpha Centauri", because there's obviously no way we would be able to handle such reports (and contrary to your apparent expectations - https://twitter.com/tobermatt/status/1278863781142179840 - we're not looking for any root programs to "Energize the demolition beams"!)

Silliness aside, the same principle (of what we can and can't handle) also applies to reports received via email, and so we chose to adjust our Problem Reporting Mechanism accordingly.

I'm well aware that you don't like this situation, but that doesn't mean we're acting in a non-compliant fashion. If a future update to the BRs or Mozilla Root Store Policy does mandate that CAs MUST accept key compromise reports via email, then we will of course perform another review and make appropriate changes to our CPS.

I was quite offended by your claim in that tweet that "if you use it (crt.sh) to make them look bad, they'll block you from using it".

If my conclusion was erroneous, I apologise. Based on the data I had available (no recent change in the Revokinator's usage pattern or overall query rates, a lot of unanswered Sectigo incident reports, a straight-up ECONNREFUSED block, no attempt at communication before or after dropping the banhammer, no documented rate limits for crt.sh queries), it was the most logical conclusion to draw at the time. Again, if that conclusion was incorrect, you have my apologies.

Apology accepted! It certainly sounds like our Ops team must've unilaterally decided to block your IP address due to (what they'll have perceived as) "abuse" of the crt.sh service. It's highly unlikely that they'll have attempted to identify or contact the natural person responsible (i.e., you).

(In reply to Rob Stradling from comment #9)

I'm well aware that you don't like this situation, but that doesn't mean we're acting in a non-compliant fashion. If a future update to the BRs or Mozilla Root Store Policy does mandate that CAs MUST accept key compromise reports via email, then we will of course perform another review and make appropriate changes to our CPS.

It's relevant to highlight some related discussions, see Bug 1639794 and Bug 1650234. Given Sectigo's issues responding timely to its own incidents, I am of course concerned about their ability to follow other incidents, and so want to draw attention to some of the broader ecosystem trends.

However, I am not sure how to read Sectigo's CPS, v5.2, Section 1.5.2.1, as anything other than supporting revocation via e-mail and committing to timely reports. Contrary to the statements here, it does not appear to carve out the revocation portal/ACME endpoint as being the only path to report revocation. Perhaps that was the intent with the opening of 1.5.2.1, but if that were the case, it would appear Sectigo is failing to provide a method for reporting on the other elements noted within Section 4.9.1 of the BRs.

This would appear, like Bug 1650234, an inconsistent CPS.

And let me repeat my offer (in my reply to your tweet) to liaise with our Ops team to find out if/why you were blocked and to get you unblocked.

I don't see how that helps. Since, as you previously asserted, Sectigo's ops team blocked the Revokinator because it was putting too much load on the crt.sh database cluster, there's no point in removing the block because the Revokinator is still going to put the same amount of load on the crt.sh database cluster in the future -- people aren't showing any eagerness to stop publicising their private keys. Hence why I deliberately used "can't" instead of "won't" -- because as you've stated, crt.sh does not have infinite capacity and there are some things that crt.sh is not a good fit for, the Revokinator apparently being one of them.

Out of interest, is the Revokinator open-source?

No, because the nebulous benefits didn't appear to outweigh the time costs required. The Revokinator isn't a completely stand-alone system; without some mechanism for feeding it compromised keys, it doesn't do anything useful, and open-sourcing the rest of the pwnedkeys system as well is a very non-trivial exercise. There's a lot of moving parts and interdependencies in there, none of which I've ever gotten around to documenting -- another victim of my lack of copious free time.

At any rate, in my experience, trying to swallow a big PR from someone unfamiliar with a codebase takes at least as much time as just writing the code myself. So I could take the time to open source the Revokinator, someone else could take the time to understand the code, database schema, and the change requirements, then write the PR, but the ACME support still wouldn't land any quicker because reviewing the PR wouldn't happen any sooner or quicker than if I'd written the code myself.

I'm well aware that you don't like this situation, but that doesn't mean we're acting in a non-compliant fashion.

The BRs require "clear instructions" for reporting suspected private key compromise. Pointing to an ACME revokeCert endpoint does not, to my mind, constitute "clear instructions" -- I don't think that sufficient familiarity with ACME and its tooling can be considered common knowledge at this point in hostory. Thus, "here's an ACME directory knock yourself out" doesn't constitute "clear instructions". You also cannot report "suspected" key compromise via revokeCert, which the BRs require a CA to accept and process on the same timeline as all other problem reports.

No doubt Sectigo can re-write their CPS to fit the cracks in the BR / Mozilla Policy language (potentially with a few more rounds of "here's a new CPS!" / "here's why the new CPS is broken!" to crowdsource BR interpretation -- a common CA practice that I happen to abhor), but surely the time and energy expended doing that would be better spent responding to the numerous outstanding incident reports and improving Sectigo's CA operations such that it wouldn't require such sophistry to remain compliant.

At the end of the day, though, given that neither of us is anyone special in the Mozilla CA module, I guess we'll have to agree to disagree on this one, and await the judgement of those who do have Super Cow Powers.

(In reply to mpalmer from comment #11)

And let me repeat my offer (in my reply to your tweet) to liaise with our Ops team to find out if/why you were blocked and to get you unblocked.

I don't see how that helps. Since, as you previously asserted, Sectigo's ops team blocked the Revokinator because it was putting too much load on the crt.sh database cluster,

I didn't assert, Matt. I speculated, which is all I can do given that you won't tell me your IP address(es).

there's no point in removing the block because the Revokinator is still going to put the same amount of load on the crt.sh database cluster in the future

That statement assumes that the crt.sh code, database and infrastructure are already operating optimally such that there exist no opportunities for improving performance. That's not an assumption that I am making.

...At any rate, in my experience, trying to swallow a big PR from someone unfamiliar with a codebase takes at least as much time as just writing the code myself.

I thought you might say that. That's often been my experience too.

I'm well aware that you don't like this situation, but that doesn't mean we're acting in a non-compliant fashion.

The BRs require "clear instructions" for reporting suspected private key compromise. Pointing to an ACME revokeCert endpoint does not, to my mind, constitute "clear instructions" -- I don't think that sufficient familiarity with ACME and its tooling can be considered common knowledge at this point in hostory. Thus, "here's an ACME directory knock yourself out" doesn't constitute "clear instructions".

We agree. Hence why our "clear instructions" also mention our Revocation Portal at https://secure.sectigo.com/products/RevocationPortal.

You also cannot report "suspected" key compromise via revokeCert, which the BRs require a CA to accept and process on the same timeline as all other problem reports.

If you "suspect" that a key has been compromised, but you don't actually have a copy of that key, then...what exactly will your report consist of?
What proof will you provide? Are you saying that a CA MUST provide a way for third-parties to send "problem reports" (such as "I suspect that the private key corresponding to certificate X is currently being displayed in plaintext at the planning department in Alpha Centauri") that the CA is guaranteed to not be able to verify?

A certificate Subscriber might suspect that their key has fallen into the wrong hands. If so, they have plenty of options for getting their Sectigo certificate revoked.

No doubt Sectigo can re-write their CPS to fit the cracks in the BR / Mozilla Policy language (potentially with a few more rounds of "here's a new CPS!" / "here's why the new CPS is broken!" to crowdsource BR interpretation -- a common CA practice that I happen to abhor), but surely the time and energy expended doing that would be better spent responding to the numerous outstanding incident reports and improving Sectigo's CA operations such that it wouldn't require such sophistry to remain compliant.

"sophistry /ˈsɒfɪstri/ noun the use of clever but false arguments, especially with the intention of deceiving." (https://www.google.com/search?q=sophistry)

Really?

We're absolutely not trying to shirk any of our responsibilities and we most definitely do not intend to deceive anyone! What we're trying to do is to learn from previous revocation delay incidents, to improve automation (and therefore also improve scalability and accuracy), and to be realistic and transparent about how we will accept Certificate Problem Reports going forwards.

It is the CA's prerogative to choose how it will accept Certificate Problem Reports. The BRs do not mandate that CAs must make choices that are maximally convenient for Matt Palmer. Crucially, in our view the BR / Mozilla Policy language does not mandate that the CA must accept problem reports via email, or that the CA must accept all types of problem report at the same contact point.

If you report a key compromise to sslabuse@sectigo.com, then (what our CPS is trying to say is) we will not regard it as a Certificate Problem Report per the BRs, but we will nonetheless handle it as best we can. If you want to send us a Certificate Problem Report for a key compromise, then please do(!), but you will need to follow our "clear instructions" and use either revokeCert or the Revocation Portal.

At the end of the day, though, given that neither of us is anyone special in the Mozilla CA module, I guess we'll have to agree to disagree on this one, and await the judgement of those who do have Super Cow Powers.

If any of those fine folks tell us that in their judgment (either due to how they interpret the BRs, or because they wish to impose rules that are stricter than the BRs) we need to accept all types of Certificate Problem Report at each and every contact point mentioned in our CPS (section 1.5.2.1), then I will propose the following changes (to section 1.5.2.1) to our Policy Authority:

  1. Drop the email contact points (sslabuse@sectigo.com and signedmalwarealert@sectigo.com).
  2. Extend the Revocation Portal webpage so that it accepts all types of Certificate Problem Report.
  3. Remove the mention of ACME revokeCert, because this contact point cannot accept Certificate Problem Reports that are not key compromise reports.

(In reply to Ryan Sleevi from comment #10)

However, I am not sure how to read Sectigo's CPS, v5.2, Section 1.5.2.1, as anything other than supporting revocation via e-mail and committing to timely reports. Contrary to the statements here, it does not appear to carve out the revocation portal/ACME endpoint as being the only path to report revocation. Perhaps that was the intent with the opening of 1.5.2.1, but if that were the case, it would appear Sectigo is failing to provide a method for reporting on the other elements noted within Section 4.9.1 of the BRs.

The intent of Sectigo's CPS, v5.2, Section 1.5.2.1, is to accept all the types of Certificate Problem Report identified by BR 4.9.1, but with the requirement that different types of Certificate Problem Report be sent to different Sectigo contact points.

If the wrong contact point is used, then the report will not reach the Sectigo team or automated system that is ready, willing and able to process it within the BR deadline. For key compromise reports in particular, we do not want our first line Support staff (who handle the email contact points) to be in the critical path, because (as we've seen in previous revocation delay incidents) it has become painfully obvious that they are not suitably adept at verifying such reports and in some cases even realising the need to escalate such reports to more experienced staff. "More training" is not the answer here. One does not simply become a PKI expert.

"Clear instructions" are great, but we can't assume that everybody will read those instructions. And so, even if the wrong contact point is used, we will still seek to process the report to the best of our ability. We think this is more "customer-friendly" than simply ignoring the report or telling the reporter that they need to re-submit their report in a different way. We can't and won't treat such reports as Certificate Problem Reports per the BRs though, for reasons explained in the previous paragraph.

This would appear, like Bug 1650234, an inconsistent CPS.

We are more than happy to do some more wordsmithing of Section 1.5.2.1 of our CPS. But first, we would be extremely grateful for feedback regarding whether or not the Mozilla CA Certificates module owner/peer(s) approve of (what I've described above as) our intent. We believe that our intent is compliant with the BRs and the various root program policies, but since this is apparently controversial we are seeking clarification of how others interpret the relevant requirements.

If you "suspect" that a key has been compromised, but you don't actually have a copy of that key, then...what exactly will your report consist of? What proof will you provide? Are you saying that a CA MUST provide a way for third-parties to send "problem reports" (such as "I suspect that the private key corresponding to certificate X is currently being displayed in plaintext at the planning department in Alpha Centauri") that the CA is guaranteed to not be able to verify?

I'm not saying anything. That is from the BRs.

What we're trying to do is to learn from previous revocation delay incidents, to improve automation

It's really hard to square this statement, which I interpret to mean that Sectigo :heart:s automation, with this one:

If any of those fine folks tell us that in their judgment (either due to how they interpret the BRs, or because they wish to impose rules that are stricter than the BRs) we need to accept all types of Certificate Problem Report at each and every contact point mentioned in our CPS (section 1.5.2.1), then I will propose the following changes (to section 1.5.2.1) to our Policy Authority:

  1. Drop the email contact points (sslabuse@sectigo.com and signedmalwarealert@sectigo.com).
  2. Extend the Revocation Portal webpage so that it accepts all types of Certificate Problem Report.
  3. Remove the mention of ACME revokeCert, because this contact point cannot accept Certificate Problem Reports that are not key compromise reports.

Which I interpret to mean that, if the Mozilla CA module owner interprets the BRs in a fashion that does not suit, Sectigo will reduce its problem reporting mechanisms to a single mechanism which cannot be used in an automated fashion by problem reporters. "Sectigo loves automation when we do it, but we're really not a fan when you do it" is... quite the message.

At this point, I think Sectigo's methods of receiving, or not receiving, problem reports are reasonably tailored to the situations as they exist. However, Kathleen and I are monitoring the discussion/situation and will comment/respond further as appropriate or needed.

Whiteboard: [ca-compliance] Next Update 15-July 2020 → [ca-compliance] Next update 15-Aug-2020

For the avoidance of doubt, does your statement mean that Sectigo is, or is not, obligated to action key compromise reports within 24 hours of receiving them via e-mail?

Hi Matt,

I take Rob's response as a committment from Sectigo that they will not treat emails mentioning revocation as "problem reports". I'm deeply uneasy about this, and the response at the end of Comment #12. I'm particularly concerned about:

For key compromise reports in particular, we do not want our first line Support staff (who handle the email contact points) to be in the critical path, because (as we've seen in previous revocation delay incidents) it has become painfully obvious that they are not suitably adept at verifying such reports and in some cases even realising the need to escalate such reports to more experienced staff. "More training" is not the answer here. One does not simply become a PKI expert.

I have trouble squaring this with accepting other forms of problem reports via e-mail, since the same problems are going to apply here. I'm equally concerned about the dismissal of training, especially since Sectigo has to train all their staff, especially those managing incident reports and operations. I can understand they need experience, but I can imagine a number of other solutions to help build experience (e.g. pairing a junior member with a more experienced lead, ensuring such e-mail contact points for problem reporting go to experienced staff, etc).

It is, understandably, a balance between avoiding an unreasonable burden on CA operations and balancing the needs of relying parties. I'm quite concerned with Sectigo, in particular, given their many failures to respond in a timely fashion, for which there's been zero reasonable explanation. Further, this concern is not directed at one person, but organizationally, for how long it took to begin to see any form of change, which, to the best of my knowledge, has still not been substantively responded to, as documented in Bug 1563579.

As we see from this bug, Sectigo would like us to take it on faith that e-mail is simply too hard to respond to, and only accepting reports via a particular API is truly the "best" solution. I don't think the facts provided support this, nor is there any pattern of high-quality incident reports that might suggest it's appropriate to take Sectigo at their word of the difficulty.

While I had been previously inclined to accept that e-mail reporting and response is complex and challenging, the lack of evidence and documentation about the challenges that support a lack of timely response to problem reports, and the continued failure on incident reports, suggests to me that Sectigo is simply under-invested, rather than it being a systemic ecosystem challenge. Requiring support for e-mail problem reports, for all cases, might be the better balance, given the trend we're seeing.

Flags: needinfo?(rob)

(In reply to Ryan Sleevi from comment 17)

I take Rob's response as a committment from Sectigo that they will not treat emails mentioning revocation as "problem reports".

I assume what you meant to say here is, "...emails mentioning key compromise..." not "...mentioning revocation..."

That is exactly what we are, or rather were, saying, however, we’re stepping back from that position given the pushback in the responses to this bug. We had based our response and our policy on Mozilla’s incident response guidance, specifically, this paragraph:

For example, it’s not sufficient to say that “human error” of “lack of training” was a root cause for the incident, nor that “training has been improved” as a solution. While a lack of training may have contributed to the issue, it’s also possible that error-prone tools or practices were required, and making those tools less reliant on training is the correct solution. When training or a process is improved, the CA is expected to provide specific details about the original and corrected material, and specifically detail the changes that were made, and how they tie to the issue. Training alone should not be seen as a sufficient mitigation, and focus should be made on removing error-prone manual steps from the system entirely.

In particular this sentence, “Training alone should not be seen as a sufficient mitigation, and focus should be made on removing error-prone manual steps from the system entirely,” argues heavily in favor of, and indeed virtually commands automation wherever possible and that’s what we’ve tried to do here. Key compromise is a binary matter. The key is either compromised or it isn’t. You can either present evidence that the key is compromised or you cannot. Those are the only two questions and if they are both true the only possible outcome is revoke the certificate, and that being the case the process of dealing with key compromise cries out to be automated more than just about anything else that I can imagine.

(Also from comment 17:)

I have trouble squaring this with accepting other forms of problem reports via e-mail, since the same problems are going to apply here. I'm equally concerned about the dismissal of training, especially since Sectigo has to train all their staff, especially those managing incident reports and operations.

Most other problem report types do not require the same level of technical expertise to confirm, nor do they have the large number of possible permutations in the form such reports might take, and none offer themselves up for automation so easily as key compromise. I think Rob's previous comment was a little off the cuff and thus off the mark. We acknowledge that we can and should, and in fact do train our first line staff to handle reports of key compromise. However, even assuming the person looking at a given report is the most senior, smartest, most well-trained person in the company, humans are fallible. We all have bad days, we all make mistakes. I assume that’s why the aforementioned paragraph was written into the Mozilla incident response guidance.

(In reply to Ryan Sleevi from comment 10):

However, I am not sure how to read Sectigo's CPS, v5.2, Section 1.5.2.1, as anything other than supporting revocation via e-mail and committing to timely reports. Contrary to the statements here, it does not appear to carve out the revocation portal/ACME endpoint as being the only path to report revocation.

Ryan, on review we agree that we could have more clearly communicated what had been our intent, and it had been my plan to update our CPS to clarify this language, however from your response so far you seem to be saying that we must maintain the “error-prone manual steps.” That being the case, I don’t see any point in changing the CPS since your position is that the current wording requires us to do just that. If I’ve mischaracterized or misunderstood your statements in any way, please let me know.

automation wherever possible

... I think this is where the disconnect lies. Yes, automation over human fallibility is the goal, however I don't see that pervasive automation is possible when processing Certificate Problem Reports. Sectigo appears to be attempting to turn certificate problem report processing into an "automation is possible" situation by constraining the manner in which problem reports may be made, and opinions appear to differ as to whether or not the constraints that Sectigo are imposing are acceptable. My opinion is that the constraints are not acceptable, for several reasons.

Firstly, there are many ways in which a key can be compromised, and thus there are many ways in which such compromise can be demonstrated. For instance, if I found an unsecured HSM on the Internet, I may not be able to get it to participate in the ACME revokeCert dance, or run openssl on it (to generate Sectigo's "revocation token"), but I could still get it to sign an arbitrary CSR or some other artifact with a key that the HSM manages. Is that key compromised? I hope we'd all agree that it was. And yet, Sectigo's CPS does not currently accept such a proof of key compromise, and hence they would not be required to revoke certificates containing that key. I don't see how that is a net positive for the Web PKI, and I feel that Sectigo's continued insistence on refusing to accept such reports casts grave doubts on their trustworthiness.

Secondly, the BRs require that CAs accept reports of suspected key compromise (4.9.3), not just "reports of key compromise that can be proven via ACME revokeCert". I cannot see how one would report a suspected key compromise using Sectigo's current CPS. Yes, reviewing a report of suspected key compromise may require the skills and experience beyond that of a front-line customer service person. Such is the nature of the beast -- part of systems design is including suitable "exception handling" mechanisms that allow for dealing with circumstances outside of the ordinary. In the case of human-centred systems, escalations are an integral part of that. If processing reports of suspected key compromise is unduly burdensome, I'm sure that the CA/B Forum and root programs would like to see the evidence of those unduly onerous suspected key compromise investigations, so that the BRs can be updated to match.

I do not accept that key compromise reports are necessarily more difficult to understand than other forms of certificate problem report, either.
Consider a problem report along the lines of https://bugzilla.mozilla.org/show_bug.cgi?id=1639502, but for a certificate rather than an OCSP response. Are Sectigo's front-line personnel sufficiently versed in the subtleties of DER, such that they would all be able to understand the nature of the problem, and confidently act in accordance with the BRs in responding to such a report?

Finally, the BRs require investigation of all problem reports within 24 hours, with a report of the investigation made to the submitter within that time period.

"investigation /ɪnˌvɛstɪˈɡeɪʃ(ə)n/ noun the action of investigating something or someone; formal or systematic examination or research." (https://www.google.com/search?q=investigation)

That doesn't sound like the kind of thing that one can necessarily automate away. For those reports that can be automatically investigated and handled, great. However, that a CA can automate some problem reporting does not imply that a CA can ignore any other problem reports, nor does it mean that the CA can rewrite their CPS such that they straight-up refuse to abide by the BRs for some problem reports.

(In reply to Rich Smith from comment #18)

I think Rob's previous comment was a little off the cuff and thus off the mark. We acknowledge that we can and should, and in fact do train our first line staff to handle reports of key compromise.

+1. This training of our first line staff over the past few months has been more successful than I had anticipated or was aware of when I wrote comment #13.

(In reply to mpalmer from comment #19)

automation wherever possible

... I think this is where the disconnect lies...

Matt, I think these are all now moot points as far as this incident bug is concerned.

and I feel that Sectigo's continued insistence on refusing to accept such reports

You seem to have misunderstood comment #18, in which Rich wrote that the pushback in the responses to this bug has caused us to step back from our previous position.

In other words, we are now once again accepting key compromise Problem Reports via email.

We encourage reporters to use our Revocation Portal and/or ACME revokeCert endpoint whenever possible. But also, as mentioned a couple of days ago in https://bugzilla.mozilla.org/show_bug.cgi?id=1650845#c5 and https://bugzilla.mozilla.org/show_bug.cgi?id=1648717#c10, we are looking at how we can improve our customer-service systems; there will almost certainly be things we can do to help our first line staff deal with key compromise reports that cannot be (fully) automated.

(In reply to Ben Wilson from comment #15)

At this point, I think Sectigo's methods of receiving, or not receiving, problem reports are reasonably tailored to the situations as they exist. However, Kathleen and I are monitoring the discussion/situation and will comment/respond further as appropriate or needed.

Ben, do you think this bug can now be closed?

Flags: needinfo?(rob)

I'll close this on or about 12-Aug-2020 unless further questions or issues are brought forth.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] Next update 15-Aug-2020 → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.