1712188 - Sectigo: test certificates issued from trusted CA

Claves Nostrum

Reporter

Description

•

4 years ago

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0

Claves Nostrum

Reporter

Comment 1

•

4 years ago

Sectigo seems to have been issuing and revoking some strange certificates from this CA with test information in the subject field: https://crt.sh/?Identity=%25&iCAID=192864

Ryan Sleevi

Updated

•

4 years ago

Assignee: bwilson → tim.callan

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Whiteboard: [ca-compliance]

Ben Wilson

Updated

•

4 years ago

Type: defect → task

Tim Callan

Assignee

Comment 2

•

4 years ago

We received a report yesterday evening about a QWAC issued to Test User. We revoked it and have been looking for and revoking other certificates with the same problem. We are still ensuring all affected certificates are handled and investigating how this happened, so we're not ready to file a bug report yet.

We will of course file a report once we have the full story.

Tim Callan

Assignee

Comment 3

•

4 years ago

Attached file Misissued Sectigo QA test certificates 5-21-21.xlsx — Details

Tim Callan

Assignee

Comment 4

•

4 years ago

Here is a snapshot on where we are with this issue. This snapshot does not intend to be a complete writeup but rather is our attempt to keep the community informed in a timely fashion. We are still in discovery and response stages, so what you see here may evolve as we learn more.

On Wednesday evening we learned of a public certificate that we had issued to Test User. At that point we commenced an investigation into the nature, scope, and causes of this problem.

We have identified a recurring problem whereby as part of our post-deployment QA process we have issued certificates with incorrect details. These include obviously wrong information such as organization names like “QA,” four-digit zip codes, and state/locality mismatches. So far we have identified 38 of these certificates. I am attaching a list of the active certificates we’ve discovered so far.

All known misissued certificates are revoked, and in an abundance of caution we have revoked all certificates created by this QA account, even the large majority for which we see no issuance problems.

Four of these incorrect certificates were issued to unregistered domains, constituting a DCV failure. We have identified the root cause and implemented a block to prevent this misissuance from recurring. We have created a ticket to fix the underlying code to make this misissuance impossible.

The short story on the DCV misissuance is that we have retained legacy functionality from the days when manual DCV approval was an allowed as an “any other method.” This functionality remains because it plays a role in the ongoing service of enterprise accounts. As part of the setup process, each Sectigo enterprise customer receives a certificate with its organization details for a Sectigo-controlled domain, which we call an “anchor” certificate. These anchor certificates are a mechanism we have for persistent reuse of validation information during the allowed time period. Using a Sectigo-controlled domain avoids the customer’s inattention as a potential roadblock to onboarding and maximizes Sectigo’s control of the process.

By design our software prevents the reuse of DCV information across accounts. However, in the case of anchor certificates on Sectigo-controlled domains (the same domains being used across multiple enterprise accounts), we needed a way to allow issuance of these certificates. The company repurposed that original manual DCV functionality to this new process. This capability traditionally has been limited by permissions to a very small number of senior members of the validation team. These validation representatives were instructed to manually check that BR-compliant DCV had been successfully completed for the Sectigo-controlled domain within the allowed reuse time period before accepting this domain validation for certificates on new enterprise accounts. Validation staff were not permitted to use manual DCV for non-Sectigo-controlled domains.

As part of this investigation we have learned that this functionality was misused to issue four, known, QA-requested certificates without DCV occurring. Obviously that flies in the face of all our documented processes and philosophies. We don’t yet know how this decision was made, and we are actively investigating that.

We have disabled these permissions for all employees and built a workaround for the function it used to serve. Ironically, we already have an existing ticket to update DCV for anchor certificates so we can fully deprecate the manual DCV capability, but it is not live yet. As part of that ticket, manual DCV will be completely removed. (We are underway on an in-depth process to review and update our DCV processes and code [see bug #1694233], which already had identified manual DCV as a risk.)

We are actively running queries in our full certificate base (not just those requested by our QA department) to look for any other examples of DCV misissuance through this method.

Obviously this misissuance constitutes a failure of two parts:

Our software should have rendered it impossible to issue these certificates without DCV.
Our employees should not have issued any certificates for which full, correct authentication did not occur, be it for organization or domain information.

In addition to disabling manual DCV, we have ceased issuance of all QA-requested certificates until we fully account for what happened here.

We will continue to update the community as we learn more and take additional actions.

Ryan Sleevi

Comment 5

•

4 years ago

Tim: In considering failure modes here, I think there’s also a third major failure, if I’m understanding correctly so far: The failure to have detected this for so long, at least, based on the certificates disclosed thus far. As part of the incident report, it seems crucial to understand how Sectigo layers it’s compliance controls, and how this escaped detection until public report.

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 6

•

4 years ago

(In reply to Ryan Sleevi from comment #5)

Tim: In considering failure modes here, I think there’s also a third major failure, if I’m understanding correctly so far: The failure to have detected this for so long, at least, based on the certificates disclosed thus far. As part of the incident report, it seems crucial to understand how Sectigo layers it’s compliance controls, and how this escaped detection until public report.

Acknowledged and agreed. We will address this point as part of our response to this incident. I'm leaving the needinfo request live until we cover this point.

Tim Callan

Assignee

Comment 7

•

4 years ago

(In reply to Tim Callan from comment #4)
There is an implication in this comment that we should make explicit. This bug is fueled by two distinct flaws, which we discovered together as we investigated the original certificate report.

Sectigo issued certificates ordered during the QA process without performing complete authentication.
The capability existed to issue a certificate without DCV being performed, and it happened.

Though they overlap, they are distinct flaws requiring distinct responses. We are working both matters in parallel. If you look at the examples below you will see how independent the two issues are.

CEASING ISSUANCE
Authentication problem: To prevent issuance of additional QA certificates without full authentication, we instituted a temporary policy to issue no certificates stemming from QA requests and clearly and forcefully renewed all employees’ training of our policy against issuing any certificates that do not undergo full authentication.

DCV problem: We rendered issuance of new certificates without automated DCV impossible by removing the permissions level that made manual DCV possible in our systems from all employees. We are implementing a code fix to eliminate this behavior regardless of permission level.

INVESTIGATION
Authentication problem: We reviewed all active certificates issued in response to requests from our QA department. Most were correct. 35 of them contained incorrect certificate details.

DCV problem: As we reviewed the certificates issued in response to QA requests, we observed a few unusual domains. Investigation quickly confirmed that these domains in fact were not authentic. This led to the obvious question of how these certificates were issued, and our investigation subsequently revealed the DCV problem discussed above.

We investigated all certificates issued as part of our QA process for DCV irregularities, discovering the four certificates reported earlier. We at present are running an investigation of our full active certificate base to find out if there are any other certificates with this problem.

ROOT CAUSE ANALYSIS
Authentication problem: Our investigation has determined that a decision was made years ago to allow our QA department to acquire certificates without requiring full – or really any – authentication. The rationale behind that decision is not clear, and the present team is having a hard time coming up with a rationale that makes any sense. As there is no existing documentation on that decision, we have to speculate that some kind of massive communication gap occurred between our validation team and our compliance team, where the former took away an impression radically different from what the latter intended.

But we can’t really know.

And of course we can uplevel the root cause by one degree and talk about the procedural and cultural factors that made it possible for such a decision to be unknown by some and unquestioned by others. We are deep into this aspect of our response and there will be more to discuss here.

DCV problem: I have covered this topic in comment #4. Note that understanding and alleviating this hole was pretty much our first priority when this bug came to light.

PERMANENT FUTURE FIX
Authentication problem: We have a firm, unambiguous policy that all public certificates undergo full authentication and are not issued without following proper procedure, including the fact that no certificate is to be issued without meeting all necessary criteria.

DCV problem: We have removed the permission level that enabled issuance without DCV from all employees. We have built a workaround for the enterprise account flow that was the original reason for this functionality to exist. We have a ticket to remove the code enabling DCV bypass, regardless of permissions.

Tim Callan

Assignee

Comment 8

•

4 years ago

(In reply to Tim Callan from comment #7)

DCV problem: We rendered issuance of new certificates without automated DCV impossible by removing the permissions level that made manual DCV possible in our systems from all employees. We are implementing a code fix to eliminate this behavior regardless of permission level.

This code fix is deployed.

Tim Callan

Assignee

Comment 9

•

4 years ago

Our response to the DCV misissuance portion of this bug and our DCV issuance review as detailed in bug #1694233 have overlapped each other to the point where they’re fused into a single effort with no possibility of disentangling them. They are for all purposes now a single project. Since this bug is open and the other is closed, it seems to make sense to report on that progress here. I will post a message on the other bug directing readers to this bug.

We are also working on a lengthy response to the QA test certificate portion of this bug and will be able to provide a substantial response to that portion this week.

Tim Callan

Assignee

Comment 10

•

4 years ago

In our continuing investigation of this issue, we have located an additional 25 certificates ordered by our QA department with incorrect certificate details, most often the organization name. All of them were issued to real domains that Sectigo controls. We revoked all 25 of them today between 1:44 and 2:37 pm EDT. The most recent was issued May 7 of this year (this bug was written up May 20).

We believe the root causes and governing issues are the same as for previously reported certificates ordered by our QA department for test purposes. Our response, investigation, and public reporting around this incident are ongoing.

Tim Callan

Assignee

Comment 11

•

4 years ago

Our comprehensive DCV code fix is checked in and is under QA testing. We are aiming to release in this coming weekend's maintenance window. On Monday we will provide an update on that release.

Ryan Sleevi

Comment 12

•

4 years ago

Tim: Can you share more details about the analysis you've done?

For example, I note https://crt.sh/?id=4140658722 as an example of an EV certificate with serialNumber 123456789

Comment #4 talks about the DCV side, which seems orthogonal to this certificate. It would appear this is Comment #7, which Sectigo positioned as "without performing complete authentication" . Comment #9 suggested the response would be available in that week, and it's unclear if you believe Comment #10 to be that, or if you're suggesting further delays.

To be clear: The last time we saw a CA that had test EV certificates issued was a significant contributor to the removal of trust in that CA. The responses to date seem to focus on this being a training issue, but I'm concerned that there's no real analysis about how the heck the systems themselves let this be possible, and no one notice. Similarly, I'm concerned with, and wanting to understand, what sort of anomaly analysis is in place for certificates - e.g. reviewing who issued/approved what - as I want to believe that these QA certificates would have otherwise been distinguishable from the data logs, and could/should have been noticed.

For example, at Google, if I went changing code in Search, it'd be detected and alerted before a line was ever changed. If I tried mucking about with other teams' docs, it would similarly trigger such. And I certainly can't approve anything related to Google's CA efforts. So understanding how these QA employees were not noticed - or, put differently, understanding the controls Sectigo has in place to ensure the level of authentication is appropriate - is key to understanding how such an egregious issue could happen.

Tim Callan

Assignee

Comment 13

•

4 years ago

(In reply to Ryan Sleevi from comment #12)

Tim: Can you share more details about the analysis you've done?

For example, I note https://crt.sh/?id=4140658722 as an example of an EV certificate with serialNumber 123456789

Comment #4 talks about the DCV side, which seems orthogonal to this certificate. It would appear this is Comment #7, which Sectigo positioned as "without performing complete authentication" .

In our posts we have been trying to distinguish between DCV misissuance and incompletely authenticated certificates, both of which we uncovered in this bug. For example, in comment #7 you'll see I tried to show the two categories and our status/response for each. That is because, while these issues overlap, they are separate issues with separate root causes and separate responses. We are trying to be systematic and organized in how we discuss these issues for clarity's sake, as these are complicated matters.

Comment #9 suggested the response would be available in that week, and it's unclear if you believe Comment #10 to be that, or if you're suggesting further delays.

We don't believe comment #10 is that analysis at all. We have been working the test certificates issue and the DCV issue very aggressively since we learned about the original cert issued to Test User. In comment #10 we put out a new set of discovered certificates so that the community had visibility on them, and at the bottom of that post we clarified that this investigation was not yet complete.

We haven’t been throwing off-the-cuff postings up because we don't want to represent that we have discovered all certificates misissued this way and then later discover more. We also don't want to represent that we have fully investigated the problem and its root causes and then later have to change what we say. At the same time we want to share progress on this issue with the community on an ongoing basis. That is the balance we’re trying to strike with these communications.

We have tried to give incremental updates to help the community see where the investigation is going -- and please be clear that new information is still coming in. We have a lengthy draft explanation that we haven't posted because we continue to refine it based on this investigation. I can understand the desire to know more, so we can post our present findings and analysis with the strong caveat that there likely is more to come. I will work on packaging up what we have so far to get a post live.

The responses to date seem to focus on this being a training issue, but I'm concerned that there's no real analysis about how the heck the systems themselves let this be possible, and no one notice.

We don't want to just dump this on training, although clearly we had a huge human process failure. At its core this problem was enabled by weak systematic checks on issuance, weak process documentation, and the belief that certificates ordered by the company for testing purposes somehow followed different rules. We have much more detail coming on this point.

Similarly, I'm concerned with, and wanting to understand, what sort of anomaly analysis is in place for certificates - e.g. reviewing who issued/approved what - as I want to believe that these QA certificates would have otherwise been distinguishable from the data logs, and could/should have been noticed.

For example, at Google, if I went changing code in Search, it'd be detected and alerted before a line was ever changed. If I tried mucking about with other teams' docs, it would similarly trigger such. And I certainly can't approve anything related to Google's CA efforts. So understanding how these QA employees were not noticed - or, put differently, understanding the controls Sectigo has in place to ensure the level of authentication is appropriate - is key to understanding how such an egregious issue could happen.

I didn't mean to suggest that the QA department was minting its own certificates. These certificates, ordered by our own employees, went through normal channels. However, our validation process skipped steps with these certificates because they were believed to be "internal." This failure is the core of the problem. Again, please stay tuned for the coming post.

Tim Callan

Assignee

Comment 14

•

4 years ago

I expect to publish a lengthy account of our investigation into the testing certificates portion of this bug and our action plan not later than Monday the 28th.

Our DCV code fix has passed QA and is set for deployment during our weekend window. I expect to publish a detailed account of our DCV misissuance investigation and response during this coming week.

I thank you all for your patience on this matter.

Tim Callan

Assignee

Comment 15

•

4 years ago

As detailed in comment #7, the original report of Test User kicked off an exploration for us that ultimately uncovered two separate issues, each with its own root causes, immediate response, and long term fix. They are very different in their nature. Managing both separate issues in a single bug is potentially confusing, and we have found ourselves going to pains to clarify which of these two issues we’re talking about in each individual post. As there is more to say on both topics, we believe it will better serve transparency, clarity, and the community’s knowledge if we break these discussions into two distinct threads.

Therefore we recommend starting a new bug for the DCV portion of this bug, leaving the testing certificates portion on this thread. If we receive no objections to this idea, we will open a new bug on Monday with a capsule summary of the DCV situation and a reference back to this bug for the full background.

Tim Callan

Assignee

Comment 16

•

4 years ago

A component of this bug is that Sectigo employees ordered certificates for made-up organizations and our validation department issued them. This post is a lengthy examination of why that occurred and our remediation plan to address the root causes of these failures. As this investigation is ongoing, we may discover more information and update this report in later posts.

Analysis

At its most basic level, this was a failure in our authentication process. However, our QA and compliance departments also contributed to this issue. We’ll start with QA.

QA
There is nothing inherently wrong with our QA professionals ordering certificates with incorrect organization information. As a point of fact, that is a real-world scenario that CAs must deal with regularly. It would have been a perfectly valid part of the QA process, had that been the point, to input orders designed to fail authentication and confirm they had done so.

But that was not the point in this case. Rather, the QA department didn’t see the accuracy of input information as being meaningful one way or the other except for where it was looking for a match between the order information and the final certificate. Bug # 1712120 comment #5 alludes to this practice in its description of stages 3 and 4, although of course we can order certs manually as part of our QA practice in addition to using automation.

Pre-production code goes through testing using a private Sectigo root so that bugs in the software cannot result in misissuance. We also, however, test production releases to ensure they have been deployed correctly. This testing must of necessity occur using our public roots to ensure the test conditions exactly match the subscriber’s experience. It is perfectly fine for the QA department to order certificates on public roots and have them issued. It is also perfectly fine for the QA department to place orders, not expecting them to be fulfilled, for certificates using incorrect information. What is not fine, obviously, is for any certificate on a public root to be issued with incorrect information. While no such errors occurred for certificates ordered by outsiders, it occurred repeatedly for internally ordered certificates used for testing.

This incident reveals some failures in our testing function and its use of certificates. The first is that the department should have registered that these certificates ordered for visibly wrong organizations and issued with visibly wrong certificate details should not have been issued. Certificates we found and revoked included,

Made up organization names that were not our company
Mixed state and locality from different countries
Four-digit zip codes for US addresses
Unregistered domain names

Any of these should have been a red flag once the issued certificate arrived. QA professionals who properly understood our process and our obligations as a public CA would have noticed these errors and brought them to someone’s attention. That didn't happen.

The second failure is that if the QA department was looking for a successful, end-to-end, ordering-to-issuance test, it should have known to include correct organization details from the beginning. It should have understood that the expected case for a certificate ordered for an organization that didn’t exist – or for a domain that didn’t exist – is failure to issue. That understanding was not present.

The root cause of these failures is lack of understanding of the requirements placed upon a public CA. We have QA professionals who have viewed their jobs in terms of designing and executing test cases, with no eye on our holistic performance of the expectations placed on a public CA.

We responded swiftly. We immediately suspended our QA department’s account, so that further certificates could not be requested and any pending certificate requests would not be issued. We kept that account disabled for more than a week while we took other measures to ensure this sort of misissuance could not occur again. Once those measures were in place, we reenabled this account. As QA certificate issuance was a requirement for releases to our production system, we had to suspend deployments for our public CA operation until we felt able to reenable certificate ordering from the QA department.

VALIDATION
The most obvious failure is that our validation department issued certificates ordered by our employees without requiring correct validation processes. The validation team team had a massive misunderstanding in its belief that because the CA itself was ordering these certificates for testing purposes, validation need not occur. Although many other Sectigo staff members keenly remember the "last time we saw a CA that had test EV certificates issued" and watched closely as this became "a significant contributor to the removal of trust in that CA," it has become clear that our validation department did not share this insight.

Members of our validation department have expressed the belief that some years ago compliance leadership told validation leadership that certificates ordered by us for testing purposes did not require validation. We have no documentation supporting that conclusion and an audit of our email system has revealed a 2019 email from our then Chief Compliance Officer stating the opposite. However this misunderstanding formed, the validation team had adopted this undocumented “rule” as SOP for orders of this type.

Our current compliance leadership team is hard pressed to imagine why the compliance team would have agreed to such a thing (a conclusion supported by our inability to find documentation of this verdict). We have to speculate that some kind of massive miscommunication occurred, perhaps exacerbated by the “telephone game” as information was passed down the validation organization from management to individual contributors and from older employees to newer ones. For example, the original ruling may have been that pre-production testing (which occurred on a private root) could be pushed through without authentication, and then over time that distinction became cloudy in the validation team’s minds. Let me reiterate that this is speculation as we can find no documentation of such a conversation.

This points to another problem in our validation function, which is that documented procedure was not required for action to occur. Our documentation of validation procedures is in many places light and/or out of date, including no explicit direction on internal testing certificates, and this deficiency has led our validation department to accept word-of-mouth ad hoc training as doctrinal.

Early this year we identified the need for a full review and upgrade of our validation documentation. The compliance department owns this project with extensive input from the validation team. I reference this project in bug #1714628 comment #4 as “Review and revamp of all validation process documentation and training.”

Another validation failure was the team’s choice to misuse our Manual DCV function to issue certs that were unable to pass DCV, as described in comment #4 and comment #7. Once again, this appears to owe itself to the belief that certificates ordered for our own testing don’t need to follow standard authentication rules.

A final failure point was the willingness to go along with a practice that did not align with the fundamental requirements of a public CA. There should have been individuals in our validation department saying, “Wait, are we sure we should be doing this?” Just as we should have had individuals in our QA department saying, “Wait, are you sure I should be getting these certificates?” These failure points are two of the same, and a little further below this post will go into more on this topic.

SYSTEMS
During the issuance period we lacked specific checks to isolate internally ordered testing certificates for extra scrutiny. We also were missing the type of systematic checks that would have blocked at least some of the issued certificates, like those with unreal registration numbers or zip codes. And of course our DCV code update, which we just deployed, would have prevented some of this misissuance as well.

This aligns with our general philosophy to put programmatic checks in place wherever possible. We have some of those systematic checks now in place in direct response to this incident and more on the way. We go into more detail in the “Response Plan” section, below.

COMPLIANCE
Ultimately it’s the compliance department’s job to find issues like this. We did not. Internal audit didn’t catch these certs because the total volume was low and none of them was ever in the random sample. WebTrust audit did not catch them for the same reason.

Realistically, discovery of this kind of failure would have had to come from a thorough internal investigation of our validation processes. I am confident our comprehensive DCV investigation (as detailed originally in bug # 1694233 and now in this bug) would have discovered it. We would have found the few certificates with DCV misissuance and upon looking at those certs, at least one of them would have contained visibly incorrect organization details. That would have opened the box the same way the Test User certificate did, and we would have wound up in the same place.

I also believe that our comprehensive validation documentation review and update would have caught it, given the chance. One could imagine someone from validation mentioning the “special case” missing from the documentation and the compliance project owner coming to the realization that this erroneous behavior was occurring. Unfortunately we aren’t yet far enough along in this project to have seen that potential event come about.

FAILURE TO PROPAGATE PUBLIC CA CULTURE TO EVERY NEW EMPLOYEE
These specifics are extremely important, but they are manifestations of a larger issue, which is the fact that we did not effectively indoctrinate all our new employees with a detailed understanding of the accepted practices of a public CA. Sectigo has grown significantly since its inception. At the time of the split from Comodo we had 76 people in the company. We currently have 425. We have made a great many additions to QA, support, validation, R&D and other departments. This episode has shown us that many employees are disconnected from other parts of the organization and our greater mission and purpose as a public CA.

We have employees in our validation and QA departments who knew the specific sets of steps they were supposed to take (for the most part) but didn’t seem to understand why, to understand the consequences of not following prescribed operating guidelines. We see a very narrow focus on the specific results of a QA test without registering a different problem that wasn’t in the scope of the test. We see the acceptance of a perceived (albeit false) policy that is deeply in opposition to the expected behavior of a public CA. These are examples of employees failing to understand our holistic value system and culture as a provider of instruments of public trust.

Of course, there are also plenty of employees who understand well our mission as a public CA. The members of the compliance team, the WIR team, and the executive suite, for instance, are crystal clear on this mission and what it entails. Many of us have been shocked to learn that we misunderstood the day-to-day behavior of other employees in essential roles. These people (myself included) knew what to do. The compliance team knew what to do. Senior leadership knew what to do. But we didn’t realize that not all employees were doing it.

It has become apparent that we have some cultural “bubbles” at Sectigo. In part they are based on geography. We have large offices in Odessa, Ukraine and Chennai, India, in addition to our offices in Bradford, UK; Roseland, New Jersey, and Ottawa, Ontario. Bradford, Roseland, and Ottawa have heavy concentrations of digital certificate veterans, product management, and senior leadership. Odessa and Chennai have heavy concentrations of technical individual contributors.

An employee in Ottawa might think of our responsibilities in terms of how to operate a trusted, compliant CA, while her equivalent in Odessa might instead be thinking of the general best practices that would apply to any technology provider. Either would believe she had a “correct” professional world view with appropriate priorities. Neither would imagine her peers in other offices having a different knowledge base or set of priorities. Neither would perceive any kind of rift in our attitudes or practices. They would each have a blind spot about how other employees think.

But these bubbles don’t just cut by geography. As mentioned above, we have a strong set of long-time PKI people and digital-certificate industry veterans. Let’s call them “PKI insiders.” But many of our newer employees are technology generalists who do not join us with a long history in digital certificates, even if they are expert in the traditional skill sets for their functions. Of the roughly 350 employees we have added since our inception as a standalone company, many of them come to us with no previous CA experience. Let’s call them “PKI outsiders.”

Looking at Sectigo, we have institutional factors that tend to clump PKI insiders and outsiders into their own groups. Senior management is very heavy in PKI insiders, so when I sit in an executive meeting, there is a high level of industry memory and understanding of the world of public CAs. But a product development scrum meeting, let’s say, is heavy in PKI outsiders, and the degree of long-term industry knowledge in that meeting is low.

In addition to the “bubbles,” we need to take a hard look at the employees’ perception of their own empowerment to raise concerns when a behavior does not appear to match industry expectations. It’s a real possibility that one or more employees engaged in or benefiting from this practice raised an eyebrow at what they saw but decided not to pursue it. They may have thought “Not my concern” or “Above my pay grade” or “These people must know what they’re doing.”

As we referenced in bug # 1708934 comment #15, we in the compliance team were of the impression that employees knew to do so. That’s because in the “insiders” bubble, the one occupied by the compliance team and by senior leadership, employees do. We consider fact-based pushback to be healthy, expected behavior. We listen and evaluate concerns on their merits. We seek data and other forms of relevant information. We do not punish those who speak up and actively encourage challenging viewpoints.

But obviously we have employees who didn’t do that, which appear to be the PKI outsiders. The tragedy here is that we have sought to create an open, transparent culture. We have a deliberately flat organization. We share a great deal of information about the business broadly across the company and several levels down the hierarchy. Before this incident I would have said all employees know they can speak up if they think something is wrong. Now we’re not so sure.

Response Plan

Comment #4 and comment #7 describe the immediate actions we took to shut off this kind of misissuance. This post covers the actions we are taking to address the root causes of these issuance failures as detailed above.

PROGRAMMATIC CHECKS
We have implemented or are implementing a set of programmatic checks designed to block and monitor for erroneous issuance of internally ordered test certificates. They are:

Automatic escalation for high-risk keywords
Automatic escalation for high-risk IP addresses
Testing certificate details lockdown
Automatic escalation of orders from testing accounts
Daily issuance reporting and review
Require dual approval for all account creation
Daily new account reporting and review

Automatic escalation for high-risk keywords - IN PRODUCTION
We have defined a set of keywords shown by previous activity to be used in organization names for testing certificate orders. Any order through any ordering channel with a keyword from this list appearing in its organization name is automatically flagged for vetting by our escalation team, with manual approval required before the order can commence. These keywords include:

The first and last names of every QA team member
The first and last names of every onboarding team member
The first and last names of every technical support team member
The name Sectigo
Previously used strings from the earlier noncompliant certificates including:
- QA
- Test
- SectigoTest
- SectigoQA
- Sectigossl
- Cybertls
- onboarding
- mgtechssl
- dynamicdata
- otttawa
- mg tech
- onlineuzrs
- SASP
- netscaler
- demo
- zxcv
- asdf
- qwerty
- 1234
- 0987
- abc
- ijn
- okm
- xyz
- qaz
- wsx
- edc
- rfv
- tgb
- yhn
- ujm

To deliver this capability quickly we repurposed an existing brand monitoring function. We will monitor this list as time goes on and can add, modify, or remove strings at will. This check is in production today.

Automatic escalation for high-risk IP addresses - IN PRODUCTION
We have implemented a check for any order coming from an IP address of any Sectigo office. All such orders are automatically held for review by the escalation team before the order can proceed. This check is in production today.

Testing certificate details lockdown – JULY 13
All existing test certificate accounts will be replaced with new accounts put in place by management with input from the compliance team. We will programmatically restrict these accounts to orders using a predefined set of organization details and domain names, all of which will be valid for Sectigo. We have prohibited the creation of new accounts outside this procedure for all employees, and failure to follow this mandate is grounds for termination. We are presently auditing our use of internal accounts so they all can be shut down and replaced with official accounts.

As there is no need for these certificates to persist in active status, certificates from these accounts will be limited programmatically to one-day term.

Once they are settled, we will publish the organization names and domains available to these testing accounts so that the community can easily monitor our use of them through crt.sh.

Automatic escalation of orders from testing accounts - JULY 13
All orders placed from the test accounts defined above will be automatically flagged for processing by the escalation team before the order is allowed to proceed. The nature of the testing certificate details lockdown may make this check unnecessary. We will evaluate it to see if we should discontinue it over time.

Daily issuance reporting and review – JULY 13
A daily report of all orders placed from Sectigo internal accounts and all escalations for the previous day will be generated and circulated to the WIR team for review.

Require dual approval for all account creation – RELEASE DATE TO BE DETERMINED
All newly created accounts will require dual approval before activation.

Daily new account reporting and review – JULY 13
A daily report of all accounts created the previous day will be generated and circulated to the WIR team for review.

Existing projects – VARIABLE DATES
A final point is that programmatic quality initiatives we already have underway such as our comprehensive DCV review and update and our Guard Rails project will be further protections against this kind of misissuance in the future.

PERSONNEL
The individuals in validation and QA responsible for these decisions have been removed from their current responsibilities while we conduct our investigation. Out of respect for employee privacy, we are presently unable to share more information.

It’s worth mentioning that, for reasons unrelated this incident, we have a new Vice President of Support Operations (Support and Validation) as of May 10.

EMPLOYEE EDUCATION AND CULTURE
I have written in bug #1714628 comment #9 about what we are doing to instill consistent knowledge of the expectations for public CAs, especially using our Sectigo University program.

Our all-employee public CA training will be in production by August 1. This training will focus on ensuring that employees understand the expected behavior of a public CA as a steward of online trust, including the fundamentals of certificate practices and compliant issuance.
Our detailed training will be in production for all specialty roles by October 1. These custom trainings will focus on the specifics of issuance requirements and our process so that employees in technical roles such as engineering, testing, and technical customer service are better equipped to make the decisions that conform with compliance. A software developer who knows the validation rules is less likely to write a bug. A QA engineer who knows these rules is more likely to notice anomalous behavior. A tech support rep who knows them is better enabled to give customers useful guidance about what they may and may not do with their certificates.

We will augment that with widely used methods like town hall meetings and employee-facing marketing programs. It’s apparent that these activities, so far, have neglected the fundamentals discussed above. As an example, the Chief Compliance Officer (yours truly) will present at every company all-hands moving forward.

Flags: needinfo?(tim.callan)

Andrew Ayer

Comment 17

•

4 years ago

I think that Comment 16 is a bit heavy on training and blaming employees who make mistakes (which we know from past incidents to be inadequate) and light on systemic fixes.

For instance, it seems like a major contributor to this incident was that the validation team knew that the certificate orders came from the QA department. Had the orders been indistinguishable from a normal order, the validation team would have validated them like a normal order instead of believing they could skip validation.

Are there any systemic fixes that could address this? For instance, could the QA team place orders from accounts that are not obviously linked to Sectigo? Could parts of the validation process be "blinded" (so the validation team doesn't know what account made the order)? Subjecting QA orders to extra restrictions and escalations would be counter-productive if it makes QA orders more obvious, since it would make the system more susceptible to human error of treating QA orders differently from normal orders.

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 18

•

4 years ago

(In reply to Andrew Ayer from comment #17)
The seven programmatic checks listed in the Response Plan are systematic fixes to directly address the problem by removing employees’ ability to go off the ranch. Between them these checks restrict the certificates that can be ordered internally to compliant and pre-approved details, limit their term to one day, and provide full visibility on all such orders. The escalation capability offered very short time to deployment, so it was the first action to take, along with clearly articulating the correct policy to every involved employee, of course. We expect the systematic restrictions to prevent a repetition of this problem.

QA is not here to lay traps for the validation team. QA is here specifically to confirm the correct working of our software and systems (and in this instance the successful deployment of previously confirmed code). Therefore there is no utility in camouflaging internally ordered certificates when we can programmatically prevent the malfeasance.

Note also that we are locking down more than the QA department. We didn’t expect this to happen from any single part of the company, but since it turns out to have happened, we want to prevent its repetition from any part of the company at all. We don’t want a sales person or a tech support person or a senior VP to be able to make a phone call and wind up with the same result. When we have things like order numbers and telephones and employees who know each other’s names, there is no real concept of “blinding” an order. In that scenario if one employee wanted special treatment for an order, a simple email could do the trick. Tell the validation rep which order number you’re interested in, and the blind is removed.

We’ve tried to think broadly in our response to this issue. We looked for immediate action to prevent this misissuance in the short term. We looked for software changes to put in place to render it permanently impossible for employees to repeat this behavior, regardless of training or intent. We looked for visibility programs to satisfy ourselves and the public that these other changes had their intended effect.

But we couldn’t ignore the human aspect in this case. It’s an undeniable part of this incident and failing to address it would be irresponsible. While we are putting systems in place to render this behavior impossible regardless of what the individual reps try to do, we also need to ensure they know what the right decision looks like. That knowledge should be part of the daily decision making for all sorts of employees in all sorts of ways. I struggle to see how our investigation or this writeup would have been complete without addressing that part of it.

Flags: needinfo?(tim.callan)

Ryan Sleevi

Comment 19

•

4 years ago

(In reply to Tim Callan from comment #18)

The seven programmatic checks listed in the Response Plan are systematic fixes to directly address the problem by removing employees’ ability to go off the ranch. Between them these checks restrict the certificates that can be ordered internally to compliant and pre-approved details, limit their term to one day, and provide full visibility on all such orders.

Perhaps I'm missing something important, but my reaction is similar to Andrew's: this seems to attribute a lot to human error and judgement. I'm also not seeing how the paths outlined - which seem largely to be blocklists and escalation - actually achieve what you've stated here, which is restriction to compliant and pre-approved details.

Obviously, the issue with DCV can only be meaningfully addressed by ensuring there is no path to manually bypass DCV (whether skipping it entirely or claiming apriori it's been completed). Yet for other issues, such as organization fields, it seems there can be similar meaningful technical controls (such as those DigiCert has previously reported on), without requiring this "escalation + don't be naughty training".

I appreciate that there are clearly corporate cultural issues here, which seem remarkably similar to Symantec's cultural issues leading to distrust. Symantec's focus (on sacking the people responsible) didn't really inspire confidence, when it was the systemic complexity and lack of strong technical controls in place.

As long as there are processes in place that require manual interaction or validation, I understand that the risk being highlighted here will exist: someone can be convinced to say "I did the work," when they did not do the work. Yet it would seem the same risk of "Oh, I know Brenda, I can just click OK here" is the same risk as "Oh, Joe Malicious gave me $500 to click OK on this request". The former might be caught by your blocklists, but the latter certainly will not. Figuring out how to remove people from the loop, as much as possible, seems to be a better strategy than blocklisting based on past failures to try to prevent new ones.

Equally, I'm really trying to wrap my head around QA-in-Production, especially at the scale being suggested here. What's lacking in this analysis is a sort of deeper dive as to "Why QA-in-Prod", and similarly, "Why do QA this way?" Don't get me wrong, even browsers like Chrome and Firefox have manual QA teams that test features in release and pre-release binaries, so I get it that machines aren't perfect. But I would have expected a more detailed evaluation about how much can and should be replaced with automation in test infrastructures. In general, when you find someone needing to QA in prod infrastructure, it suggests that the Prod infrastructure may be... very "ad-hoc", and so creating a "test" infrastructure is a manual process or unlikely to yield the same results. That, however, speaks to system design and architecture.

I don't mean to pick on Rob here, because https://crt.sh remains a truly impactful contribution by Sectigo, but despite being open-sourced, it's not exactly something one can just deploy outside of Sectigo's infrastructure. If that philosophy extends to Sectigo's current production, then it suggests deeper systemic issues, not only in that it makes it difficult to spin up testing infrastructure, but also in the risks to maintaining version controlled configuration, disaster recovery approaches, and intrusion detection.

I realize it might seem hyperbolic to go from "QA made a mistake" to "Oh, you could be at risk of being the next DigiNotar", but hopefully I've captured the sort of systemic thinking. "Why was QA testing in prod" opens up a host of new questions, and I hope you can provide more detail there, and not just "Why did QA make a mistake", as Comment #16 gets into.

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 20

•

4 years ago

(In reply to Ryan Sleevi from comment #19)

Figuring out how to remove people from the loop, as much as possible, seems to be a better strategy than blocklisting based on past failures to try to prevent new ones.

We very much see that. If you look across multiple conversations we’re having on multiple bugs, you’ll observe that this year’s biggest theme is replacing human judgement and human labor with defined software operations. This strategy is reflected in open bugs like:

Bug 1710243 (Invalid stateOrProvinceName)
Bug 1712120 (Inappropriate subject:serialNumber information in EV certificates obtained through ACME)
Bug 1715024 (Misspellings in stateOrProvince or localityName fields)
Bug 1715929 (Incorrect EV businessCategory)

Blocklisting can be a useful addition to automation initiatives like these but is not a surrogate for them.

Equally, I'm really trying to wrap my head around QA-in-Production, especially at the scale being suggested here. What's lacking in this analysis is a sort of deeper dive as to "Why QA-in-Prod", and similarly, "Why do QA this way?" Don't get me wrong, even browsers like Chrome and Firefox have manual QA teams that test features in release and pre-release binaries, so I get it that machines aren't perfect. But I would have expected a more detailed evaluation about how much can and should be replaced with automation in test infrastructures. In general, when you find someone needing to QA in prod infrastructure, it suggests that the Prod infrastructure may be... very "ad-hoc", and so creating a "test" infrastructure is a manual process or unlikely to yield the same results. That, however, speaks to system design and architecture.

As part of our release process, we perform post-deployment QA and retain and audit the recorded results, which we communicated to the Bugzilla community in bug 1518553 comment 15.  Our QA team runs a subset of available tests, depending on the areas affected by the release. We generally deploy releases weekly, which are mostly incremental and do not require running complex test suites after deployment. For the vast majority of releases we are not ordering certificates on live systems. We typically only order production certificates for major events such as failover tests, database upgrades, and complex feature deployments touching multiple areas and/or having cross-product dependencies.

Some aspects of production environments are not simple to replicate on “test” infrastructure. CAs and HSMs are production units, and some other elements are not identical, such as production traffic patterns and data. Limited post-deployment tests can quickly confirm that our post-release production systems are functioning correctly end-to-end.

We have considered the possibility of restricting the post-deployment QA that occurs in our production environment to use untrusted private CAs exclusively. However, even though such a restriction would have stopped the incident described in this bug from occurring, it's not an approach that we intend to take. To fully test, end-to-end, any deployment that may affect production certificates, our QA playbook demands that those certificates are not only issued but also collected, installed, and tested with various client software, which in most cases necessitates the use of publicly trusted certificates.

Over the past three years we have increased test automation coverage, and now more than 75% of release testing occurs through automation. We have been aggressively increasing automated QA coverage of legacy code, and all newly developed features include QA automation from the outset. With these improvements we are now able to support more than forty releases a year as compared to three years ago when that number was fewer than twenty.

We report and record all test results. For post-deployment testing, we understand and agree that our goal needs to be full automation with prepared data sets and automatic checking and approval of results., We want to add automated cleanup (such as certificate revocation) where necessary. Fortunately, this is not a huge leap as the fundamentals of our automation coverage are in place.

Our QA environment (“test” infrastructure) yields excellent results, and we have not had cases where we needed to backtrack releases due to severely broken functionality. We have not observed differences in test results between “test” infrastructure and production infrastructure. That being said, test automation is a work in progress with more opportunity to improve both the quality of our tests and testing data sets and our automation coverage. We are diligently and continually getting better in both areas. Our plan is to reach 90% automation of release testing in QA environments by the end of 2021. As for post-deployment testing our aim is to reach 100% automation by September 1, 2021.

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 21

•

4 years ago

(In reply to Tim Callan from comment #16)

Daily new account reporting and review – JULY 13
A daily report of all accounts created the previous day will be generated and circulated to the WIR team for review.

This report is now in production and we are reviewing new account creation daily.

Tim Callan

Assignee

Comment 22

•

4 years ago

(In reply to Tim Callan from comment #16)

Daily issuance reporting and review – JULY 13
A daily report of all orders placed from Sectigo internal accounts and all escalations for the previous day will be generated and circulated to the WIR team for review.

This report is in production and we are reviewing escalations and internal orders daily.

Ryan Sleevi

Updated

•

4 years ago

Comment 23

•

4 years ago

(In reply to Tim Callan from comment #16)

Once they are settled, we will publish the organization names and domains available to these testing accounts so that the community can easily monitor our use of them through crt.sh.

As if today, all QA certificates we order will come from a single account with these details.

Domain:
sectigo.ca

Organization information:
Sectigo Limited
26 Office Village, Exchange Quay, Trafford Road 3rd Floor
Salford, Manchester
M5 3EQ
United Kingdom
businessCategory - Private Organization
companyNumber – 04058690

Tim Callan

Assignee

Comment 24

•

4 years ago

As we have been working with our new testing certificate profile, we have made two improvements.

In addition to sectigo.ca, we are also able to issue testing certificates for the domains sectigotestcerts.com, sectigotestcerts.net, and sectigotestcerts.org. This allows us to break up the certificates we order by use case and know an entire set of certificates is on a single domain name.
We are increasing the duration of ALL certs on our testing accounts from 1 day to 7 days. This will reduce the timing inconvenience of things like weekends, holidays, sick days, and the unexpected stuff that comes up in people’s lives.

Tim Callan

Assignee

Comment 25

•

4 years ago

Attached file testing_certificates_20210722.xlsx — Details

Tim Callan

Assignee

Comment 26

•

4 years ago

We have completed our research of misissued testing certificates. To do so we have conducted a search of all accounts used for certificate testing across all departments including QA, development, onboarding, sales, and technical support. We shut down any account discovered this way if it had the ability to request certificates on public roots. We allowed accounts that are strictly enabled for private roots to remain in place. This investigation was quite thorough and exacting, and we are confident we have discovered all such accounts.

These departments now all use newly established accounts following the criteria described in comment 23 and comment 24.

As we discovered these accounts, in addition to shutting them down we immediately revoked all active certificates, prior to investigating them for misissuance. We have subsequently examined all active, public certificates ordered from any of these accounts and discovered an additional 30 certificates from those accounts that were issued to nonexistent organization names or with incorrect organization details. Since we performed the abovementioned mass revocations in advance of investigation, all these certificates were revoked on June 17 or June 18. I have included the links in attachment 9232639 [details].

Tim Callan

Assignee

Comment 27

•

4 years ago

We are monitoring this bug and are working on a seven-point writeup.

Tim Callan

Assignee

Comment 28

•

4 years ago

(In reply to Tim Callan from comment #16)

Require dual approval for all account creation – RELEASE DATE TO BE DETERMINED
All newly created accounts will require dual approval before activation.

We released dual approval for account creation into production on July 25.

Tim Callan

Assignee

Comment 29

•

4 years ago

1. How your CA first became aware of the problem

On May 19, 2021 we received a misissued certificate report from another CA regarding a QWAC issued to Test User. We revoked the certificate and began an investigation to see how it had occurred.

2. Timeline

May 19, 2021
Sectigo receives the initial report about a QWAC issued to the wrong party. Sectigo revokes this certificate.

May 20
Initial report for bug 1712188 is filed by an observer, who notices the previous day’s revocation. Sectigo replies to state we received the initial report and performed that revocation in response.

May 21
Sectigo provides a preliminary report of what we knew at the time about incompletely validated testing certificates in comment 4.

May 21 to July 22
Sectigo commences a detailed investigation of our certificate ordering and issuance process for “testing” certificates from anywhere in the company, including QA, technical support, onboarding, and sales. Sectigo is simultaneously investigating our programmatic DCV issuance practices as detailed in bug 1718771 and our misuse of Manual DCV as covered in bug 1718579. These investigations are intertwined.

June 17
We announce 25 additional certificates with incomplete validation discovered and revoked from additional testing accounts.

June 17 to 18
Sectigo performs sweeping revocations of certificates requested from internal accounts prior to investigation of these certificate bases for misissued certificates. Subsequent research discovers a total of 30 additional certificates from these accounts with incorrect organization details, as explained in comment 26.

June 26
We implement automatic escalation of certificate requests containing high-risk keywords observed in previously misissued testing certificates.

June 26
We implement automatic escalation of certificate requests coming from IP addresses for Sectigo offices.

June 28
Sectigo publishes a lengthy analysis of the testing certificates problem in comment 16. This analysis covers failure points in our QA, validation, and compliance functions as well as the general cultural failure leading to this problem. This comment discusses how our systems failed to prevent this behavior.

The comment goes on to explain in detail our technical, training, and cultural mitigation plans. The technical mitigation plan includes seven programmatic checks,

Automatic escalation for high-risk keywords
Automatic escalation for high-risk IP addresses
Testing certificate details lockdown
Automatic escalation of orders from testing accounts
Daily issuance reporting and review
Require dual approval for all account creation
Daily new account reporting and review

July 6
Our daily new account report goes into production and daily review begins.

July 12
Our daily issuance report goes into production and daily review begins.

July 13
Sectigo announces the details of internal testing certificate accounts in comment 23. We provide subsequent updates on these details on July 19 in comment 24.

July 22
We announce the completion of our research and the final 30 discovered certificates (which were among those revoked on June 17 and 18).

July 25
Dual approval for new account creation released into production. This represents the last of the seven technical checks announced on June 28.

3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem

We have ceased issuance of certificates with incomplete authentication.

4 A summary of the problematic certificates

97 certificates issued between August 13, 2019 and June 10, 2021.

5. Affected certificates

We provided links to discovered certificates in comment 3, comment 10, and comment 25. These and the certificate originally reported in comment 1 entail the full set of affected certificates.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now

Comment 16 includes a detailed explanation of how these mistakes occurred.

7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future

Comment 16 includes a detailed explanation of our technical, training, and cultural mitigation plans.

At this stage we have implemented all seven programmatic checks outlined in that comment.

We have held three all-employee town hall meetings to discuss our obligations as a public CA and have stood up mandatory training for all employees on this topic. We are on track to roll out additional detailed trainings for technical and validation staff by October 1.

Nikola Maksimovic

Comment 30

•

4 years ago

In addition to the domain names we announced in comment 23 and comment 24, we have added one more domain name for use with test certificates, which is sectigodev.com. We will also issue certificates for subdomains of this name.

Ryan Sleevi

Comment 31

•

4 years ago

To make sure: Sectigo test certs will be for the following domains:

sectigo.ca
sectigotestcerts.com
sectigotestcerts.net
sectigotestcerts.org
sectigodev.com

Comment #23 originally stated:

As if today, all QA certificates we order will come from a single account with these details.

Comment #24 then modified that:

This allows us to break up the certificates we order by use case and know an entire set of certificates is on a single domain name.

Comment #30 then expanded this

To be honest, I don't have a clear picture about why these expansions or why these multiple domains are used. It would, naively, seem that the original plan in Comment #23 was the best plan of action. I'm hoping you can share clearer technical details (than those provided in Comment #24) about why the change in plans?

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 32

•

4 years ago

(In reply to Ryan Sleevi from comment #31)

I'm hoping you can share clearer technical details (than those provided in Comment #24) about why the change in plans?

It is an easily identifiable source of information for how a certificate came into being. When we first had the idea of locking down the domains used for testing certificates, we started with a single domain, which was sectigo.ca. Then as we began to spin up our new accounts using the new details, we realized the potential for assigning different domains to different groups inside the company. So we created the next set of three domains – sectigotestcerts.com, sectigotestcerts.net, and sectigotestcerts.org – for use by Development and QA, Sales and Onboarding, and Technical Support, respectively. That has the benefit that anyone in the company can look at any certificate and know the department from which it originated without any specific access or investigation required.

We subsequently decided to further increase visibility on usage by splitting out certificates requested by Development from those requested by QA. That is where sectigodev.com comes in. We could have established a new domain such as sectigotestcerts.info. However, sectigodev.com had been used by our Development team prior to the creation of this bug. Choosing that domain made it possible to use processes and scripts that we previously had built without changes, reducing the risk of errors and hang-ups due to switchover. All else being equal, simpler implementation is preferable. So we decided to break from our previous pattern to take advantage of earlier work.

One pragmatic consequence of this process is that we may use sectigo.ca seldom or never, as the foreseen, “testing” certificate usage is from the four groups listed above. However, in the event of some scenario that falls outside these groups, we have that as a domain we are allowed to use without messing up our easy method of identifying the source departments of certificates.

Flags: needinfo?(tim.callan)

Ryan Sleevi

Comment 33

•

4 years ago

•

Edited

Thanks. I was curious if there are technical reasons for using different registerable domains, as opposed to, say, qa.sectigo.ca, dev.sectigo.ca, sales.sectigo.ca, etc? It would seem to have the same benefits, but still use the original domain name.

The subtext to the question here is trying to understand the impact this design has on how domain validation is tested. If there is a meaningful difference between qa.sectigo.ca vs sectigotestcerts.com, in terms of how it interacts with the system, then it suggests also the possibility of differences between the different registerable domains. It also suggests that the validation system may be special-casing its notion of registerable domains (or, worse, two label domains) different from three-label domains.

Given the overall validation issue that interconnected this, the goal was to better understand the edge cases of the validation system, in the event there are other potential issues.

To be clear: I'm not saying you need to go back to just sectigo.ca. I'm trying to understand a bit of the rationale for the current decision, and trying to understand if alternatives were considered and rejected, not considered, or something else entirely :)

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 34

•

4 years ago

(In reply to Ryan Sleevi from comment #33)

I was curious if there are technical reasons for using different registerable domains, as opposed to, say, qa.sectigo.ca, dev.sectigo.ca, sales.sectigo.ca, etc?

Had there been a compelling reason to go that route, we might have. Lacking any such compelling reason, we preferred the slightly more flexible option we went with, more flexible in that we can order a certificate for an apex domain without forsaking our easy identification method as described in comment 32. Can we imagine a scenario where we would want to differentiate apex domains from subdomains? Sure. What if a bug is reported that only occurs with one or the other? In that circumstance we could find ourselves testing scenarios where the presence or absence of a subdomain matters.

This is not a big deal at all, and as I said earlier, we could have gone the other way if there were a reason to. But we saw no such reason, so we picked the strategy that appeared to be slightly better.

If there is a meaningful difference between qa.sectigo.ca vs sectigotestcerts.com, in terms of how it interacts with the system, then it suggests also the possibility of differences between the different registerable domains. It also suggests that the validation system may be special-casing its notion of registerable domains (or, worse, two label domains) different from three-label domains.

We are not aware of any such behavior.

Flags: needinfo?(tim.callan)

Tim Callan

Assignee

Comment 35

•

4 years ago

Are there any more questions or comments we can address regarding our testing certificate domains or any other aspect of this issue?

Tim Callan

Assignee

Comment 36

•

4 years ago

At this point we have described this flaw and its root causes in depth and delivered all remediation actions except one. We have delivered part of the detailed training for specialty roles as described in comment 16 and are on track for full delivery by October 1. We will continue to monitor this bug and announce the deployment of that training when it occurs.

Tim Callan

Assignee

Comment 37

•

4 years ago

We remain on pace to deliver our training for specialty roles by the end of this month.

Tim Callan

Assignee

Comment 38

•

4 years ago

We have delivered all content for our additional training for technical employees and at present are adding it to Sectigo University.

Tim Callan

Assignee

Comment 39

•

4 years ago

On Friday, October 1 we launched our additional training for relevant technical employees to Sectigo University.

Tim Callan

Assignee

Comment 40

•

4 years ago

The deployment of our specialty technical training to Sectigo University represents the last of our specific, committed mitigation activities for this issue. We will continue our campaign of cultural education about the full breadth of public CAs’ responsibilities as part of our ongoing practice, month in, month out.

Are there any other matters we should be discussing regarding this issue?

Tim Callan

Assignee

Comment 41

•

4 years ago

A great deal has been written on this bug, and all the community’s questions and comments appear to be addressed. Ben, is it time to close this bug?

Flags: needinfo?(bwilson)

Ben Wilson

Comment 42

•

4 years ago

I'll consider closing this on Wed. 20-Oct-2021, unless I receive comments to the contrary.

Ben Wilson

Updated

•

4 years ago

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Flags: needinfo?(bwilson)

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

3 years ago

Product: NSS → CA Program

David Lawrence [:dkl]

Updated

•

3 years ago

Whiteboard: [ca-compliance] → [ca-compliance] [ov-misissuance]

Misissued Sectigo QA test certificates 5-21-21.xlsx 4 years ago Tim Callan 10.95 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet		Details
testing_certificates_20210722.xlsx 4 years ago Tim Callan 11.75 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet		Details