1694233 - Sectigo: Inadequate DCV

Assignee

Description

•

5 years ago

Attached file DCV_error.txt — Details

How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

We received a report from one of our partners indicating that under certain circumstances we could issue an SSL certificate for which Domain Control Validation (DCV) had not properly occurred.

The flaw occurred when a subscriber ordered a single-domain certificate for a www subdomain (e.g. www.example.com), elected the email validation method of DCV, and chose a constructed email address on the www subdomain (e.g. postmaster@www.example.com) without supplying an email address for the apex domain as well. As a courtesy to our Subscribers, for many years Sectigo (and previously Comodo) has included the apex domain (example.com) in a SAN for every order of a single-domain certificate for a www subdomain.

The partner sent a personal email directly to Rob Stradling and me at 03:36 UTC on Wednesday, February 10 pointing out this anomalous behavior. We received the report upon opening email at the start of the work day. We looked into the matter the same day and verified the reported behavior.

A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

Wednesday, February 10 03:36 UTC

Message sent to email inboxes of two individual employees

Wednesday, February 10 11:45 UTC

Message opened and viewed
Investigation conducted, concluding that it is possible to obtain a certificate for which DCV had not been conducted for an apex domain SAN, as detailed in 1, above
Commenced work on fixing this behavior

Thursday, February 11 02:05 UTC

Fixed code deployed and verified

Tuesday, February 16

Began investigation for affected certificates

Thursday, February 18 20:00 UTC

Completed list of 1548 affected certificates
Sent notifications to subscribers

Friday, February 19 22:03 UTC

Revoked all identified certificates

Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

On Thursday, February 11 we deployed code preventing the issuance of certificates with incomplete DCV for apex domains. For certificate orders meeting the criteria described in 1, above, we now issue a certificate that does not contain the apex domain in a SAN. Customers wishing to have this domain included can do so by choosing a different authentication method or providing a validation address for the apex domain.

A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

1522 active certificates

The earliest of these was issued February 6, 2018.

The last was issued February 8, 2021.

The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

We have attached a list of crt.sh links for the 1522 affected certificates.

Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

Comodo had a longstanding practice of including both the www subdomain and the “bare” apex domain for any single-domain certificate ordered for a www subdomain. This was in recognition of the common practice of including the same home page content on both these domains. We feel this is a good, helpful practice and continue it.

Comodo historically had viewed email DCV of the www subdomain as adequate for authenticating the apex domain as well, in light of how www subdomains are used. With the passage of Ballot 190 in September 2017, Comodo judged that this practice was no longer allowed according to the BR. Comodo updated our HTTP-based and DNS-based DCV checker code to remove the apex domain SAN from certificates for which DCV had occurred only for the www subdomain.

However, it appears that the task of updating our email DCV code in a similar fashion was not identified at that time, and so our systems still issued a certificate containing the additional SAN even under these prohibited circumstances. We became aware of this behavior only when a partner reported it to us.

As a further wrinkle, we employed our Bulk Revocation engine to revoke these certificates. As explained in bug 1648717, we have been developing and updating our bulk revocation engine for several months. Despite previous successful behavior, on Friday, February 19, the bulk revocation engine failed to fire correctly. After trying and failing to “unjam” the revocation event, we had to scramble technical resources to write and fire a script to revoke these certificates. Therefore the certificates were not actually revoked until roughly 26 hours after we obtained our list of affected certs.

Due to the nature of the original error it is highly doubtful that the additional two hours enabled any fraudulent or malicious activity. Nonetheless, this software error did cause us to miss our 24-hour window.

We have since identified the problem and developed a fix. This fix is scheduled for deployment on March 7. We also have developed a script to duplicate the engine’s functionality. Should a bulk revocation be required prior to that date, we will employ this script.

List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

The erroneous issuance behavior was fixed and deployed on February 11.

We have identified the problem with the Bulk Revocation Engine and developed a fix. This fix is scheduled for deployment on March 7.

Tim Callan

Assignee

Updated

•

5 years ago

Type: defect → task

Ben Wilson

Updated

•

5 years ago

Assignee: bwilson → tim.callan

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Whiteboard: [ca-compliance]

Ryan Sleevi

Comment 1

•

5 years ago

Can you help me understand how this differs from Sectigo's (then Comodo's) previous incident which appears to be nearly identical in scope and issue?

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 2

•

5 years ago

(In reply to Ryan Sleevi from comment #1)

Can you help me understand how this differs from Sectigo's (then Comodo's) previous incident which appears to be nearly identical in scope and issue?

They are different in both scope and nature.

The issue you refer to goes back to September 2016. In that case Comodo was occasionally including "dNSName=<tld>" as an additional FQDN in single-domain certificates when the company had only validated control of "[http://www.%3ctld%3e][http://www.%3ctld]www.<tld>." The bug was that some of our code wrongly assumed that a leading "www" domain component must be a subdomain and could never be part of the registered domain name.

This current bug points to an oversight in actions we took in September 2017 when adjusting for tightened rules around DCV.

September 2016: INCIDENT (https://www.mail-archive.com/dev-security-policy@lists.mozilla.org/msg04274.html) and BUGFIX.
Comodo stopped including "dNSName=<tld>" when (due to the bug) it had only validated control of "<domain.tld>" and "domain" happened to be "www". However, at that time Comodo felt that it was reasonable to continue (as an "any other method") treating control of "http://www.%3cdomain.tld%3e][http://www.%3cdomain.tld]www.<domain.tld>" as proof of control of "<domain.tld>."

September 2017: Non-incident; regular code change to comply with the abolition of "any other method."
"With the passage of Ballot 190... Comodo updated our HTTP-based and DNS-based DCV checker code," as I stated above. Comodo stopped including "dNSName=<domain.tld>" when it had only validated control of "[http://www.%3cdomain.tld%3e][http://www.%3cdomain.tld]www.<domain.tld>" via HTTP-based or DNS-based DCV. At that time Comodo unknowingly failed to update its behavior for the circumstances set out at the top of this thread. That is the root cause of this incident.

February 2021: INCIDENT (https://bugzilla.mozilla.org/show_bug.cgi?id=1694233#c0) and BUGFIX.
We stopped including "dNSName=<domain.tld>" when we had only validated control of "[http://www.%3cdomain.tld%3e][http://www.%3cdomain.tld]www.<domain.tld>" via Constructed Email-based DCV. As we explained in yesterday's incident report, Comodo should have identified this as a regular development task back in September 2017 but failed to do so.

Flags: needinfo?(tim.callan)

Ryan Sleevi

Comment 3

•

5 years ago

I see striking similarities though, and I'm not sure Comment #2 fully assuages the concern.

From the September 2016 discussion, there was a clear discussion about the need to validate "all" domains present in a cert. That was in the context of Ballot 169, which would eventually become Ballot 190, but even then, the discussion was about the need for validation.

My understanding of how this incident compares to that incident is: "In that incident, Comodo wasn't ensuring every domain was validated, and was also routinely adding additional domains." and "In this incident, Sectigo wasn't ensuring every domain was validated, and is routinely adding additional domains.".

Had Sectigo stopped the addition of the extra "www" then, as suggested, it seems like this could have been detected sooner and/or possibly prevented, by making it clearer to the engineering team what domains had been validated. Equally, it seems the choice to add the "www." domain interacted poorly with Sectigo's user interface for Applicants, by allowing them to select a subordinate domain to prove validation (the "www."), even if they'd originally requested at the base domain. Specifically, from the original description:

elected the email validation method of DCV, and chose a constructed email address on the www subdomain (e.g. postmaster@www.example.com) without supplying an email address for the apex domain as well.

If we imagine a world in which Sectigo did not require the customer to select the address to use, but instead selected it based on the request, it seems like this could have been avoided.

Equally, I think it's useful to understand the process, especially with respect to the BR Self-Assessments required, about how this evaded detection for so long. I can appreciate the review at the time was inadequate, but I'd like to better understand how it went missed for four years, and what process improvements are being done to ensure compliance and detect potential issues.

Put more directly: I do not feel, based on Comment #0, that the response to Question #6 and Question #7 do anything to inspire confidence that this issue will not be repeated. The analysis here has seemingly solely focused on the specific details of the issue, without examining the systemic factors that contributed to it, the alternatives that "may" have prevented it, or what systemic improvements can be done to improve detection of non-compliance.

It is what I would expect of an incident report focused on fixing the bug, but it does not seem to be a systemic root cause analysis. I'm hoping you can think about this from a broader perspective of systems, and how any and every change to the BRs could similarly result in the same outcome, and think about what layers Sectigo has to ensure it doesn't happen again.

Flags: needinfo?(tim.callan)

Pedro Fuentes

Comment 4

•

5 years ago

(In reply to Ryan Sleevi from comment #3)

Had Sectigo stopped the addition of the extra "www" then, as suggested, it seems like this could have been detected sooner and/or possibly prevented, by making it clearer to the engineering team what domains had been validated. Equally, it seems the choice to add the "www." domain interacted poorly with Sectigo's user interface for Applicants, by allowing them to select a subordinate domain to prove validation (the "www."), even if they'd originally requested at the base domain. Specifically, from the original description:

elected the email validation method of DCV, and chose a constructed email address on the www subdomain (e.g. postmaster@www.example.com) without supplying an email address for the apex domain as well.

If we imagine a world in which Sectigo did not require the customer to select the address to use, but instead selected it based on the request, it seems like this could have been avoided.

I'd like to better understand this issue, as in my reading of the incident report I see that the validation of "www.domain.com" could derive in an implicit validation of "domain.com", so I think the point here is not that the "www" is added.

Ryan Sleevi

Comment 5

•

5 years ago

(In reply to Pedro Fuentes from comment #4)

I'd like to better understand this issue, as in my reading of the incident report I see that the validation of "www.domain.com" could derive in an implicit validation of "domain.com", so I think the point here is not that the "www" is added.

As described, it's the fact that "www.domain.example" had "domain.example" added, and vice-versa, "domain.example" had "www.domain.example" added, but then the user is prompted to select which method of verification to use. So the point is very much that this addition was happening, and then further, from the set of requested domains, the customer was allowed to choose how to construct the verification mail to authenticate.

Tim Callan

Assignee

Comment 6

•

5 years ago

(In reply to Ryan Sleevi from comment #5)
For the sake of clarity: We add the apex domain ("domain.example") to the order for the www subdomain ("www.subdomain.example"). We do not add the www subdomain to an order for the apex domain.

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 7

•

5 years ago

(In reply to Tim Callan from comment #6)
I made an error with this comment. After looking into it, we actually do it both ways: By default a single-domain certificate order for example.com will also come with www.example.com, and such an order for www.example.com will come with example.com. This is contrary to what I stated. I misinterpreted some information I received as we were looking into this bug.

In terms of the behavior detailed in this bug, the error did NOT occur where the initial request was for the apex domain. In that case, for verifying control of the domain, we offer only constructed addresses using the apex domain. E.g. admin@example.com but not admin@www.example.com.

Tim Callan

Assignee

Comment 8

•

5 years ago

(In reply to Tim Callan from comment #2)
While I'm correcting my own errors, it looks like something I used made a hash of the links in the earlier post. I didn't notice until it was too late. Here is the content of the post again with links corrected. Hopefully this is easier to read and make sense of.

(In reply to Ryan Sleevi from comment #1)
Can you help me understand how this differs from Sectigo's (then Comodo's) previous incident which appears to be nearly identical in scope and issue?

They are different in both scope and nature.

The issue you refer to goes back to September 2016. In that case Comodo was occasionally including "dNSName=<tld>" as an additional FQDN in single-domain certificates when the company had only validated control of "www.<tld>." The bug was that some of our code wrongly assumed that a leading "www" domain component must be a subdomain and could never be part of the registered domain name.

This current bug points to an oversight in actions we took in September 2017 when adjusting for tightened rules around DCV.

September 2016: INCIDENT (https://www.mail-archive.com/dev-security-policy@lists.mozilla.org/msg04274.html) and BUGFIX.
Comodo stopped including "dNSName=<tld>" when (due to the bug) it had only validated control of "<domain.tld>" and "domain" happened to be "www". However, at that time Comodo felt that it was reasonable to continue (as an "any other method") treating control of "www.<domain.tld>" as proof of control of "<domain.tld>."

September 2017: Non-incident; regular code change to comply with the abolition of "any other method."
"With the passage of Ballot 190... Comodo updated our HTTP-based and DNS-based DCV checker code," as I stated above. Comodo stopped including "dNSName=<domain.tld>" when it had only validated control of "www.<domain.tld>" via HTTP-based or DNS-based DCV. At that time Comodo unknowingly failed to update its behavior for the circumstances set out at the top of this thread. That is the root cause of this incident.

February 2021: INCIDENT (https://bugzilla.mozilla.org/show_bug.cgi?id=1694233#c0) and BUGFIX.
We stopped including "dNSName=<domain.tld>" when we had only validated control of "www.<domain.tld>" via Constructed Email-based DCV. As we explained in yesterday's incident report, Comodo should have identified this as a regular development task back in September 2017 but failed to do so.

Tim Callan

Assignee

Comment 9

•

5 years ago

(In reply to Ryan Sleevi from comment #3)
Pardon the delay in getting back to you on this one. We all had a busy week last week and I needed an opportunity to organize my thoughts.

I see striking similarities though, and I'm not sure Comment #2 fully assuages the concern.

From the September 2016 discussion, there was a clear discussion about the need to validate "all" domains present in a cert. That was in the context of Ballot 169, which would eventually become Ballot 190, but even then, the discussion was about the need for validation.

My understanding of how this incident compares to that incident is: "In that incident, Comodo wasn't ensuring every domain was validated, and was also routinely adding additional domains." and "In this incident, Sectigo wasn't ensuring every domain was validated, and is routinely adding additional domains.".
Okay, but don’t lose sight of the fact that they had different behaviors, different root causes, and different fixes. Despite some surface similarities, the two problems were very different beasts.

Had Sectigo stopped the addition of the extra "www" then, as suggested, it seems like this could have been detected sooner and/or possibly prevented, by making it clearer to the engineering team what domains had been validated.
It would not have been detected at all, as it would not have existed. I don't imagine you're suggesting, however, that the best approach to ensuring software quality is the elimination of any piece of functionality that might imaginably contain a heretofore unknown bug.

Equally, it seems the choice to add the "www." domain interacted poorly with Sectigo's user interface for Applicants, by allowing them to select a subordinate domain to prove validation (the "www."), even if they'd originally requested at the base domain. Specifically, from the original description:

elected the email validation method of DCV, and chose a constructed email address on the www subdomain (e.g. postmaster@www.example.com) without supplying an email address for the apex domain as well.

If we imagine a world in which Sectigo did not require the customer to select the address to use, but instead selected it based on the request, it seems like this could have been avoided.
To better enable validation, we make it as easy as possible for the subscriber to perform the steps necessary to prove control. In this case we present a series of acceptable constructed addresses and allow the choice of one or more.

Attempting to make this choice on the customer’s behalf is problematic. Let’s say a customer requests a cert for x.y.z.com. What DCV email address do we select? admin@x.y.z.com? postmaster@y.z.com? webmaster@z.com?

We cannot know. So we either have to (1) spam every possible constructed email address, or (2) ask the customer to tell us which of the permitted constructed email addresses they'd like to use.

It’s a good service when it works correctly. The problem here is that it didn’t work correctly, not that we gave subscribers this choice.

Equally, I think it's useful to understand the process, especially with respect to the BR Self-Assessments required, about how this evaded detection for so long. I can appreciate the review at the time was inadequate, but I'd like to better understand how it went missed for four years, and what process improvements are being done to ensure compliance and detect potential issues.

Put more directly: I do not feel, based on Comment #0, that the response to Question #6 and Question #7 do anything to inspire confidence that this issue will not be repeated. The analysis here has seemingly solely focused on the specific details of the issue, without examining the systemic factors that contributed to it, the alternatives that "may" have prevented it, or what systemic improvements can be done to improve detection of non-compliance.

It is what I would expect of an incident report focused on fixing the bug, but it does not seem to be a systemic root cause analysis. I'm hoping you can think about this from a broader perspective of systems, and how any and every change to the BRs could similarly result in the same outcome, and think about what layers Sectigo has to ensure it doesn't happen again.

This is a big topic. It’s not too far from asking, “What are you doing to keep bugs out of your production systems?” The response to that is vast and varied and impossible to properly cover in a forum like this one.

Why didn’t we know to look for something that we didn’t know to look for? That’s an intractable problem for any technology product or service. Just about every bug that has shipped in the history of computers could have been avoided if someone had simply known to look for something they didn’t know to look for.

In 2017 someone mis-scoped or misunderstood the development task to update DCV to match new baseline requirements. Comodo didn’t realize this error had been made, or presumably it would have done something at the time. Because this error was unknown, the resulting DCV misissuance was not visible. As DCV is fully automated, it’s a hard thing for internal audit to spot. It’s one of those tricky errors.

What are we doing to ensure quality? That’s a completely different discussion with tremendous scope. The short answer is we’ve done a great deal since the carve-out on November 1, 2017 and are continually working on further improvement.

I just attended a day long technology leadership summit, of which we hold several a year, and one of the topics was our complex set of legacy systems and potential strategies for simplifying and modernizing technical operations. Another topic was the best way to organize our product and development teams for effective development. And so on. It’s difficult to answer that broad of a question in a few paragraphs, so I hope it’s an acceptable response to say that a lot of energy goes into best practices for development, one consequence of which is reduced likelihood of the kind of problem we see here.

Tim Callan

Assignee

Comment 10

•

5 years ago

We have nothing to add right now.

Tim Callan

Assignee

Comment 11

•

5 years ago

We don't have any more updates at the moment.

Tim Callan

Assignee

Comment 12

•

4 years ago

My comment #9 was an attempt to discuss the broader implications of this error and what Sectigo is doing to ensure quality on an ongoing basis. Obviously this is a big topic, as that comment acknowledges, and I tried to do it justice within the constraints of a forum like this one.

That now was a month ago. I haven't seen any additional questions from the community on that comment. If there are such questions, of course we will respond.

Ryan Sleevi

Comment 13

•

4 years ago

I can understand this is a complex topic, but I had hoped Comment #9 was an opener to be followed up with actual substance. It seems Sectigo is spending a lot of time thinking about things, but it's not clear that there is any improved understanding or clear committments to improve that can be quantified. "We're getting better" is something that is neither here nor there in terms of substance.

The issue that you touched on is that there are an ineffable set of compliance issues in Sectigo's infrastructure, and concrete steps, like an analysis of all the code in the issuance pipeline, an examination of all conditional flows and branches against compliance requirements, or a concrete plan to replace the legacy will not be forthcoming.

I get that the "unknown unknown" bugs are hard to know beforehand, and hindsight is 20-20, but my hope would have been a more systemic evaluation of the systems, especially validation systems, to make sure Sectigo has a clear and unambiguous understanding of every flow, branch, and error condition in its domain validation pipeline, and ensured that they comply.

I'm setting N-I for Ben, because while I think this is still something important for Sectigo to implement, I'm not optimistic it would or could be done "soon", even if it should already have been done, and I don't think it needs to block closing this issue.

Flags: needinfo?(bwilson)

Ben Wilson

Comment 14

•

4 years ago

I agree with Ryan's observations. I'll call this up next Wed. 14-Apr-2021 to close this, but I'd urge Sectigo to continue a diligent effort to simplify and modernize its technical operations and implement best practices in system development.

Tim Callan

Assignee

Comment 15

•

4 years ago

I certainly didn't mean to suggest that Sectigo is not systematically analyzing our processes and code.  We have been doing so in one form or another since we broke free from Comodo.

Rather, the challenge with a question like that in comment #3 is how to capture and explain it all.  We have many functions with their own flavors of expertise, with their own missions, criteria, and focal areas.  For example, our head of security is heavily focused on how to prevent penetration of our systems, but he’s not thinking about authentication procedures.  While for our head of authentication it’s the opposite.  A leader in product management, development, customer service, or infrastructure – to name a few – will each have a different perspective on what we are doing to improve quality, accuracy, and consistency.  Ask about the initiatives they are driving or find important, and each will give you a somewhat different answer.  None of them will be wrong.  It’s just that for any large technology organization, the answer to such questions is very, very complex.
 
When I tried to express that point at a high level, you found that dissatisfying.  In the case of bug 1698936 I wrote more than 700 words on what was a very narrow topic, which apparently was to the community’s satisfaction.  Perhaps I should have had the foresight to limit the scope of my response down to the specific matter of this bug: DCV.  Ryan, I see you have done so in comment #13, where you say, “my hope would have been a more systemic evaluation of the systems, especially validation systems, to make sure Sectigo has a clear and unambiguous understanding of every flow, branch, and error condition in its domain validation pipeline, and ensured that they comply” (emphasis mine).

So let’s take that angle instead.  I have secured roadmap commitment to conduct a full a review of our DCV components, software, and process.  This is a subset of the larger quality initiative I mentioned in paragraph 2 of this comment and in comment #9.  Think of this as a focusing and a reprioritization of our ongoing quality efforts rather than a new effort with new resources required.  These tasks include but may not be limited to:

conduct a fresh review of the current BRs, EV Guidelines, and browser root policies to confirm our current understanding of DCV rules
review each DCV method we use for compliance and effectiveness
deprecate any DCV method we find to be non-compliant (none expected) and revoke any affected certs
consider deprecating DCV methods that aren’t used frequently (for simplicity of code and ongoing quality maintenance)
consider adding DCV methods we don’t presently use, should we find good reason to do so
review for correctness all code that performs DCV checks
review all code that relates to reuse of previously performed DCV checks
look for opportunities to tidy up / simplify DCV-related code
report any BR violations discovered in the above and act on any affected certificates in the specified manner and timeline

As this does represent a reallocation of effort, we should consider all of the above to be in front of us.  I can’t tell you today how long it will take, but we’ll be breaking these up and tackling them in pieces, and we will report on incremental progress as it occurs.  We suggest leaving this bug open so that as we report on our progress, we can keep the full conversation in a single place.  For this particular bug we do request an exception to the guideline to report weekly, so we can all skip the “no new report” cadence.  We instead would like to report in here when we have new relevant information and of course to respond to the community’s comments.

Ben, is that acceptable?

Ben Wilson

Comment 16

•

4 years ago

Thanks, Tim.

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Flags: needinfo?(bwilson)

Resolution: --- → FIXED

Tim Callan

Assignee

Comment 17

•

4 years ago

We have established a project team to execute on the goals stated above. It consists of a project manager, the engineers working on tasks, Inigo Barreira, Nick France, and Rob Stradling. Team meetings occur weekly. We will target resolving one method per week, but we will give each method as much time as it needs for correct resolution.

This team will start with its initial review of the CA/Browser forum rules for DCV. We will be able to scrutinize them as we go deep on individual DCV methods.

Tim Callan

Assignee

Comment 18

•

4 years ago

The project team had its first meeting on April 27. The team reviewed all the current, allowed DCV methods. We have identified for consideration potential methods that are permitted in the BRs but not presently part of our included set of validation methods. The team assigned action items to begin reviewing DCV methods and code. I am expecting an update on Friday of this week and expect to follow up early next week with the results of that update.

Tim Callan

Assignee

Comment 19

•

4 years ago

For one of its first steps, the project team reviewed our CPS and discovered that we had not kept our list of supported DCV methods and their corresponding BR section numbers fully up to date. The team drafted an update which is presently in internal review.

With the recent creation of a thread on this topic at https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/CDal5qSIYvE, we feel it’s timely to bring this up now rather than accompanying the publication of the updated CPS, as we’d originally planned. We will open a new incident bug and provide a full incident report.

Tim Callan

Assignee

Comment 20

•

4 years ago

At a high level we’ve broken down our action plan into these stages:

IDENTIFY ACTIVE DCV METHODS
Complete.
We reviewed all certificates issued in the past three months and identified the full set of DCV methods used for public certificates in that time period (see below). Due to our high issuance volume, this time period is sufficient to identify the methods for nearly all (or perhaps all) issued certificates. Later in this process we will review the entire active certificate base to ensure no other, active methods slipped through.
Note that we do offer additional methods for DCV of private TLS certificates if the customer chooses them, such as pre-approved IP addresses. We reconfirmed that these are disabled for public TLS certificates.

ANALYZE ACTIVE DCV METHODS
Underway.
For each DCV method we will serially examine three things:

Document our existing process for that method, including,
a) The detailed process we follow
b) Which defined method in the BRs this falls under
c) How domains are added to the certificate after DCV completion
d) Re-use of DCV
e) Recording of DCV activity
Evaluate process compliance with CABF requirements
Perform a code review on that method

We are dividing our analysis of each method by single-domain (SDC) and multi-domain certificates (MDC) because, for mainly historical reasons, these represent different code paths in our systems. DCV methods set for analysis include:

3.2.2.4.2 Email, Fax, SMS, or Postal Mail to Domain Contact
3.2.2.4.4 Constructed Email to Domain Contact
3.2.2.4.7 DNS Change
3.2.2.4.8 IP Address
3.2.2.4.18 Agreed Upon Change to Website v2
3.2.2.4.19 Agreed Upon Change to Website ACME

IDENTIFY UNUSED DCV METHODS
Underway.
Create a list of allowed DCV methods not identified above. Depending on that list we may include an additional investigation to determine the desirability of more methods for our process.

DEFINE DCV METHODS TO DEPRECATE OR IMPLEMENT
Not started.
Use results from the earlier steps to determine if we would like to deprecate any existing, active methods or add new methods to our available options. Create individual tickets for each method to be deprecated or added.

MAKE CHANGE REQUESTS
Ongoing.
If in the process defined here we identify bugs, improvements, or other desired changes to the methods discussed above, we will create separate tickets to implement those changes. We have created a few tickets for improvements in our internal processes.

REVIEW AND UPDATE CPS
Ongoing.
In light of information gathered above, review our published CPS for accuracy. To the degree that we desire to make changes to our DCV methods, update the CPS to reflect those changes.

UPDATE DCV PROCESS DOCUMENTATION
Not started.
Review our internal procedural documentation, training material, and KB articles for errors or outdated information. Update documentation to reflect any changes resulting from this project.

Ryan Sleevi

Comment 21

•

4 years ago

(In reply to Tim Callan from comment #19)

With the recent creation of a thread on this topic at https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/CDal5qSIYvE, we feel it’s timely to bring this up now rather than accompanying the publication of the updated CPS, as we’d originally planned. We will open a new incident bug and provide a full incident report.

Checking: Have I missed a bug being filed? Between that and Comment #20 being a three week difference, I'm hoping this isn't a sign that Sectigo is once-again regressing Bug 1563579

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 22

•

4 years ago

(In reply to Ryan Sleevi from comment #21)

Checking: Have I missed a bug being filed?
You have not. Our response is very actively underway. We posted our notice early to make sure the community knew about it as it touched on a current topic. We published a new CPS on May 21, and we are presently working to understand the scope and nature of resulting misissuance.

Between that and Comment #20 being a three week difference, I'm hoping this isn't a sign that Sectigo is once-again regressing Bug 1563579
In comment #15, I did request,
For this particular bug we do request an exception to the guideline to report weekly, so we can all skip the “no new report” cadence.  We instead would like to report in here when we have new relevant information and of course to respond to the community’s comments.
As the bug was subsequently closed, we have been of the belief that filing useful information as it becomes available is the right course of action for this bug, rather than looking specifically for a weekly cadence.

And I hope it's clear we are very active on Bugzilla, as opposed to what you saw when bug #1563579 was written up. Since comment #28 we have posted more than 7500 words across six bugs. Most of these are complex in the investigation and response they require. Even the one I expected to be straightforward (bug #1712120) has left us with a significant homework assignment.

There’s a lot going on. We’re active on all of it. Where necessary we have prioritized things like responses (finding and fixing flaws; revoking affected certificates) over writing up details for this forum. That doesn’t mean we think that reporting our activities isn’t important. It is. Rather, the benefit the community gains from the information we post will still be there next week or the week after while other urgent matters get our focused effort right now.

Ryan Sleevi

Comment 23

•

4 years ago

(In reply to Tim Callan from comment #22)

(In reply to Ryan Sleevi from comment #21)

Checking: Have I missed a bug being filed?
You have not. Our response is very actively underway. We posted our notice early to make sure the community knew about it as it touched on a current topic. We published a new CPS on May 21, and we are presently working to understand the scope and nature of resulting misissuance.

It would be great to make sure we have a new incident being tracked for this, to ensure it does not get lost here in this bug. I've filed Bug 1714628 for you to include your incident report on.

Tim Callan

Assignee

Comment 24

•

4 years ago

(In reply to Tim Callan from comment #22)
There was a formatting error in the above comment. It should read this way:

(In reply to Ryan Sleevi from comment #21)

Checking: Have I missed a bug being filed?

You have not. Our response is very actively underway. We posted our notice early to make sure the community knew about it as it touched on a current topic. We published a new CPS on May 21, and we are presently working to understand the scope and nature of resulting misissuance.

Between that and Comment #20 being a three week difference, I'm hoping this isn't a sign that Sectigo is once-again regressing Bug 1563579

In comment #15, I did request,

For this particular bug we do request an exception to the guideline to report weekly, so we can all skip the “no new report” cadence.  We instead would like to report in here when we have new relevant information and of course to respond to the community’s comments.

As the bug was subsequently closed, we have been of the belief that filing useful information as it becomes available is the right course of action for this bug, rather than looking specifically for a weekly cadence.

And I hope it's clear we are very active on Bugzilla, as opposed to what you saw when bug #1563579 was written up. Since comment #28 we have posted more than 7500 words across six bugs. Most of these are complex in the investigation and response they require. Even the one I expected to be straightforward (bug #1712120) has left us with a significant homework assignment.

There’s a lot going on. We’re active on all of it. Where necessary we have prioritized things like responses (finding and fixing flaws; revoking affected certificates) over writing up details for this forum. That doesn’t mean we think that reporting our activities isn’t important. It is. Rather, the benefit the community gains from the information we post will still be there next week or the week after while other urgent matters get our focused effort right now.

Tim Callan

Assignee

Comment 25

•

4 years ago

Our response to the DCV misissuance portion of bug #1712188 and our DCV issuance review as detailed here have overlapped each other to the point where they’re fused into a single effort with no possibility of disentangling them. They are for all purposes now a single project. Since this bug is closed and the other is open, it makes sense to report on that progress there. Please see that bug for updates on our progress.

Flags: needinfo?(tim.callan)

Ryan Sleevi

Updated

•

4 years ago

Updated

•

3 years ago

Product: NSS → CA Program

David Lawrence [:dkl]

Updated

•

3 years ago

Whiteboard: [ca-compliance] → [ca-compliance] [dv-misissuance]