Closed Bug 1724458 Opened 3 years ago Closed 2 years ago

Sectigo: Mojibake in certificate Subject fields

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: tim.callan, Assigned: tim.callan)

Details

(Whiteboard: [ca-compliance] [ov-misissuance])

Attachments

(3 files)

10.52 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Details
13.26 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Details
11.41 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Details
Attached file 20210806_Mojibake.xlsx
No description provided.

1. How your CA first became aware of the problem

On July 12 our internal investigation of our certificate base discovered a series of certificates containing Mojibake in place of UTF-8 encoded Extended ASCII characters. Further preliminary investigation using crt.sh suggests that Mojibake can find its way into any open text Subject field including organizationName, organizationalUnitName, and localityName.

We do not yet fully understand all the root causes for how Mojibake can enter into certificate requests. We have a mitigation plan as described in point 7 of this comment.

2. Timeline

All times Eastern Daylight Time

July 12, 2021, 2:54 pm
Internal investigation of our certificate base reveals certificates that appear to have Mojibake in the subject:organizationName field. An initial search reveals 41 certificates from the same geographic region with this problem.

July 12 to August 6
Investigation of this phenomenon, how it occurs, and what we can do about it. This problem turns out to be complex. See parts 6 and 7 of this incident report for more detail on this point.

July 17, 1:17 pm
All known certificates revoked.

3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem

This is an intermittent problem that is difficult to characterize with no clear, simple, programmatic block. We have created an action plan for blocking this type of misissuance, as described in part 7 of this comment.

4. A summary of the problematic certificates

41 known certificates issued between March 31, 2021 and May 10, 2021.

Once we have a fix in production we will undertake a comprehensive search of our certificate base for further certificates with this problem and revoke them. Our early research to characterize this behavior suggests other CAs have active certificates with the same problem. We will report any we discover in our research to those CAs as well.

5. Affected certificates

The known, affected certificates are included in attachment 9235150 [details].

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now

We discovered in our research that OpenSSL’s CSR parsing code, which our CA uses, has a known bug when dealing with T61Strings (aka TeletexStrings), an arcane ASN.1 string type. We looked at the CSRs that correspond to the problematic certificates and for a lot of them we found UTF-8 characters stored in T61String attributes. Unfortunately, further research revealed instances of Mojibake in certificates where the corresponding CSR did not contain any T61String attributes, meaning that this particular OpenSSL bug that affects our CSR parser is not the only root cause.

The Subject Identity Information in many of the problematic certificates found so far was validated by an external RA. We don’t have visibility on this external RA’s inner workings, but we believe it’s possible that this RA’s CSR parser handles T61Strings differently, such that the Validation Specialist would have seen only the intended strings and would have been completely unaware that some characters would later become garbled.

For the remainder of the problematic certificates, we have to conclude that the root cause is human error. Our Validation Specialists did not spot the Mojibake prior to issuance.

These problematic certificates evaded detection until now because they are relatively few in number and none were previously sampled for internal audit.

To further complicate matters, we found that Mojibake can make its way into official government databases as well, as in this example from Companies House. In this instance, it would arguably be misissuance for the CA to issue an OV or EV certificate to this company without Mojibake in its organizationName field.

We concluded that enumerating and understanding all of the ways that Mojibake might enter certificate requests would be akin to boiling the ocean, and even within the realm of what we do understand, we were unable to conceive of a programmatic check that would be free of false positives and false negatives. Consequently, we instead have focused on a practical strategy to better equip our Validation Specialists with tools to prevent issuance of certificates containing Mojibake.

7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future

We have arrived at this strategy for preventing issuance of certificates containing Mojibake. This is a broad strategy and we’re still looking at specifics of implementation. While we don’t have a specific schedule yet, we expect this rollout to come in a matter of weeks, not months. We’ll provide an update when we have a firmer schedule.

We will introduce a new validation step we’ve dubbed "Character Set Review." We will add a programmatic check of certain Subject Name attributes of every OV or EV certificate request for one or more characters outside the set of printable ASCII characters. In the event of such characters, our system will hold issuance and require manual review by a Validation Specialist.

The rep will use existing qualified data sources to confirm the string or determine the true text that should appear in the offending field. If a Subject Name has been flagged for Character Set Review on a previous occasion and okayed by Validation, then this Subject Name will be flagged as approved and future issuance can go through automatically without requiring manual Character Set Review. If a Subject Name has been flagged for Character Set Review on a previous occasion and modified by Validation prior to issuance, then the original and modified Subject Names will be recorded so that we can automatically apply the same modification in the event that the customer sends us the same problematic Subject Name in the future.

We believe the primary beneficiary of this process will be the subject:organizationName field. Most Subject Name fields in the OV and EV certificates that we issue either are locked down to a discrete set of allowable values (such as the countryName and businessCategory fields) or have been eliminated or are soon to be eliminated (such as streetAddress, postalCode, and localityName). The three that remain are organizationName, organizationalUnitName, and joiLocalityName.

For the organizationName, localityName and joiLocalityName Subject Name fields, we will use “Character Set Review” as detailed above.

Validating the “correct” string for subject:organizationalUnitName is especially difficult as we have no data sources we can look to for what the correct text should be. We will solve this problem in the short term by implementing a rule that OU fields in certificate Subjects must only contain printable ASCII characters; any OU field that doesn’t meet this condition will be discarded from the certificate request prior to issuance. This is a simple, reliable, automated safeguard against issuing certificates with OU fields that contain Mojibake. In the event of customers who really wanted OU fields to be present, they can contact us to supply an OU string that contains only printable ASCII characters and we will reissue the certificate accordingly. In the longer term, OU fields will be going away due to SC47v2.

Assignee: bwilson → tim.callan
Whiteboard: [ca-compliance]
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #9235150 - Attachment description: 202107806_Mojibake.xlsx → 20210806_Mojibake.xlsx
Attachment #9235150 - Attachment filename: 202107806_Mojibake.xlsx → 20210806_Mojibake.xlsx

We are working on a release schedule for the Mojibake fix described in comment 1. We will provide an update as we solidify plans.

If a Subject Name has been flagged for Character Set Review on a previous occasion and okayed by Validation, then this Subject Name will be flagged as approved and future issuance can go through automatically without requiring manual Character Set Revie

Am I understanding correctly that the system design is such that a single mistake, by a single Validation Specialist, may then propagate indefinitely undetected?

Given the concerns here regarding human error, including at Sectigo, is there any consideration to a more in-depth review, self-audit, and cross-examination of these exceptional situations? It seems like it's prematurely been optimized for ease of issuance, rather than correctness, and that seems somewhat concerning.

Flags: needinfo?(tim.callan)

(In reply to Ryan Sleevi from comment #3)

Am I understanding correctly that the system design is such that a single mistake, by a single Validation Specialist, may then propagate indefinitely undetected?

No, that’s not correct. “[I]ssuance can go through automatically without requiring manual Character Set Review” is meant to indicate that once the normal process has been followed (remember, this check occurs after our validation process has occurred), then issuance can proceed for that exact same subject without a programmatic block. It is not an automatic pass without undergoing validation.

[I]s there any consideration to a more in-depth review, self-audit, and cross-examination of these exceptional situations?

Character Set Review is exactly that.

As a final point, we have targeted September 18 for release of this functionality.

Flags: needinfo?(tim.callan)

(In reply to Tim Callan from comment #4)

(In reply to Ryan Sleevi from comment #3)

If a Subject Name has been flagged for Character Set Review on a previous occasion and okayed by Validation, then this Subject Name will be flagged as approved and future issuance can go through automatically without requiring manual Character Set Revie
Am I understanding correctly that the system design is such that a single mistake, by a single Validation Specialist, may then propagate indefinitely undetected?

<snip> It is not an automatic pass without undergoing validation.

I restored the quoted text and added emphasis. Put differently, I'm trying to understand whether "Character Set Review" is an activity performed by a single validation specialist, who may be the same one requesting the certificate.

From your process, it sounds like if it contains characters outside the ASCII range, it gets flagged for "Character Set Review". A Validation Specialist will review that, and either decide to approve it (at which point, it gets added to an allowlist for subsequent certificates) or deny it (and the request is rejected). This validation is performed by one validation specialist, and once the review is completed, that particular value is indefinitely allowlisted to bypass "Character Set Review".

Is that a correct understanding?

Flags: needinfo?(tim.callan)

(In reply to Ryan Sleevi from comment #5)
Your description of the process is correct. We believe this process will have a high accuracy rate, as we’re catching Mojibake suspects programmatically, highlighting suspect strings in the interface, and forcing resolution for each instance before issuance can commence. We do expect many false positives as text in languages such as Simplified Chinese, Korean, and Cyrillic will be flagged every time. We have not found a reliable method for programmatically differentiating between these languages and Mojibake strings.

To confirm our belief, for the first six months after deployment we have committed to internal audit review for 100% of the instances of Character Set Review that occur. Should internal audit detect a problem with a Character Set Review, we will change the ruling on that suspect string and look for broader lessons to be learned.

So far as we can tell, this is untrodden ground. As no other CA appears to have dealt with this, we don’t have another source of ideas apart from what we’ve worked out for ourselves. With six months of data, we will be much better positioned to determine how the process is working and if changes are appropriate.

Flags: needinfo?(tim.callan)

Tim: It seems like there are a lot of options here, looking at existing lessons from comparable CA features.

You could, for example, require it be escalated to a more senior team. You could require multi-party review. You could have weekly cross-check reviews of all new strings approved, along with a periodic random sampling. These are examples of "defense in depth" thinking, which tries to prevent a single error from cascading indefinitely, and to spot check processes.

On the "What makes something Mojibake versus intentional", there are various heuristics that can flag for likely Mojibake. For example, FTFY is a permissively-licensed (MIT) library for Python that applies some quite clever regular expressions to detect the probability of Mojibakeness. You could, for example, subject those to additional scrutiny and review. Equally, I would imagine that those strings in Simplified Chinese and Korean will at least be using a common subset of scripts, so you could equally look for mixed scripts as another possible sign. This is perhaps less useful for Cyrllic (which may see latin and cyrillic scripts mixed), but still.

Mostly, the reason I was/am pushing hard on you here is the "One mistake propagates indefinitely" feels like an anti-pattern. I'm not opposed to giving validation specialists the autonomy and trust (and, presumably, training) to make calls, but this seems like a "measure thrice, cut once" sort of scenario: you want to design controls that detect and prevent badness from propagating or going undetected.

Do these ideas seem workable?

Flags: needinfo?(tim.callan)

(In reply to Ryan Sleevi from comment #7)
Now you’re getting into the realm of less specific future plans. We were viewing this as a two-stage process, the first of which we laid out in comment 1. The availability of a tool like FTFY introduces a potential additional stage, which we’ll describe below (number 2 in that list).

Our present straw man is a three-stage process for responding to Mojibake. These stages are:

  1. Detection and dispositioning. Execute the plan described in comment 1 and comment 6. This will go a long way in addressing the immediate need and position us to learn more about how these mangled characters come into the system and how best to respond to them.
  2. Preissuance linting. We believe FTFY or something similar (if that something exists) could be a valuable Guard Rail. Prior to conducting an evaluation or gathering our own data, we will have to assume that such a tool will yield false positives. In that case any order failing the linter would then go to escalation for a verdict. Before developing this linter we would gather actual hard data to understand expected behavior.
  3. Guided correction. We might also be able to use FTFY and/or our own data to determine a recommendation for the correct text and offer it programmatically for human approval. Again, this is just an idea for now, which we would have to examine more closely before actual execution.

We feel the immediate path forward is to execute on stage 1 as described earlier. When we are in production with stage 1, we will begin investigating our active certificate base to identify certificates with this sort of misissuance for replacement. This exercise will also help us build a real-world sample set of non-printable ASCII from certificate requests.

We are interested in taking this collection of strings and running it through FTFY to get a sense for that tool’s results. Based on what we see there we will continue to examine the possible stages 2 and 3 mentioned above.

WRT isolating a language like Korean or Simplified Chinese, we would need to look into it a little to determine how effective that would be. It’s worth investigating.

We welcome ideas like these and more from the community as we search for efficient and effective ways to identify and correct Mojibake in our certificate requests.

Flags: needinfo?(tim.callan)

Ben: Based on Comment #4, it seems the Next Update is for 2021-09-18.
While Comment #8 outlines some ideas, it does not seem firm dates are identified for that yet, other than no schedule is expected sooner than 2021-09-18.

Thanks for the update, Tim.

I don't have further questions, but I also don't have a high opinion of the plan outlined in Comment #1 and Comment #6, and am concerned that Sectigo may not have fully internalized the threats and lessons from the ecosystem in their process design. That's not because I think they are bad things to do, and it's not that I think #2 and #3 from Comment #8 needs to be delivered first. On their face, they are all good things.

The concern I have is that it seems the "Character Set Review" as designed places a lot of dependency on human factors, which if anything have been shown repeatedly to be flawed, and allows those flaws to propagate. Comment #6 I believe demonstrates the typical CA anti-pattern of post-issuance review. Specifically:

To confirm our belief, for the first six months after deployment we have committed to internal audit review for 100% of the instances of Character Set Review that occur.

Comment #7 hopefully makes my views clear on this (re: "You could")

Conceptually, my hope and belief is for CAs to optimize for "What's the worst that could happen, and design appropriately". The Phase 1 plan outlined in Comment #1 and Comment #6 seems to reveal a thinking of "Optimize for efficiency and assume the best". Understandably, finding the right balance between those extremes is a challenge - but these incident reports are the opportunity for the CA to demonstrate their priorities and what they optimize for. And in that regard, I think the responses are not at all encouraging.

As much as I would love to see Sectigo take a more proactive role here with Mojibake (i.e. multiple layers of human review, until such a time as multiple layers of technical controls exist), my concern is less with the specifics of this incident, and more what it reveals about how Sectigo is approaching incident management in general.

I realize this is a lot of text, and it may seem confusing about why am I saying bad things about good ideas, but hopefully I've explained why it's not that the ideas aren't good, it's that they don't feel good enough.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] → [ca-compliance] Next update 2021-09-18

QA just found a stopper in our release candidate for Mojibake phase 1. That means we will not deploy in this coming weekend’s window. At this point we expect to push the release by one week.

Our Mojibake code has passed QA, and we expect to deploy it in this weekend’s window.

Character set review occurred on Sunday as scheduled and is now in production. We are gathering reviewed strings in our database for future analysis.

(In reply to Ryan Sleevi from comment #9)
We’ve been scratching our heads over this comment and need to ask for clarification.

To confirm our belief, for the first six months after deployment we have committed to internal audit review for 100% of the instances of Character Set Review that occur.

Comment #7 hopefully makes my views clear on this (re: "You could")

(From comment 7)

You could, for example, require it be escalated to a more senior team. You could require multi-party review. You could have weekly cross-check reviews of all new strings approved, along with a periodic random sampling. These are examples of "defense in depth" thinking, which tries to prevent a single error from cascading indefinitely, and to spot check processes.

In the quoted text above we stated that our internal audit team would review 100% of these instances. That is an example of “’defense in depth’ thinking, which tries to prevent a single error from cascading indefinitely, and to spot check processes.” It is, in fact, considerably more thorough than “periodic random sampling.”

Conceptually, my hope and belief is for CAs to optimize for "What's the worst that could happen, and design appropriately". The Phase 1 plan outlined in Comment #1 and Comment #6 seems to reveal a thinking of "Optimize for efficiency and assume the best". Understandably, finding the right balance between those extremes is a challenge

We didn't mention this previously, but the efficiency trade-off in the Phase 1 plan is based on data, not premature optimization. We ran some queries on crt.sh to find all the certificates we'd issued in the previous ~6 months whose Subject attributes contained any characters outside the set of printable ASCII characters, and we used those data to project the impact of Character Set Review on our Validation team's workload. It became clear to us that forcing repeated Character Set Review of the same identical set of Subject organization name and address details every time would be impractical without a radical change in resourcing but that a one-time review of each unique instance would accomplish the same goal while having the advantage that it actually could be implemented.

Rather than allowing fear of a very small potential error rate to prevent the implementation of a greatly improved system and process, we instead drove an aggressive schedule to launch a system that was a vast improvement over the then status quo and begin gathering the learning we would need to further improve.

I would love to see Sectigo take a more proactive role here with Mojibake (i.e. multiple layers of human review, until such a time as multiple layers of technical controls exist)…

Our stated plan includes multiple layers of human review until such a time as adequate technical controls exist: The regular OV/EV validation process should catch most instances of Mojibake; Character Set Review is a second layer of human review, and 100% review by our internal audit team is a third layer. This plan will enable us to gather data in the form of a real-world set of Mojibake strings included in certificate requests, a success rate for Character Set Review, and possibly other observations on the nature and origin of these strings and the best practices in responding to them. These additional data will inform our next steps in this effort.

We have announced our intention to benchmark the effect of FTFY using this real-world dataset, as it is an important candidate component for multiple technical controls. The benchmarking results will impact our strategy for where and how to use this asset in our programmatic controls.

Regarding proactivity: Our proactive investigation of our own certificate base for potential error cases led to the discovery of this phenomenon. We proactively reported our findings to the community along with a strong first step in mitigating the problem while beginning the learning process for further improvement. We quickly implemented our stage one solution, and so far as we can tell, we are the only CA to have tackled this problem in any way. None of this was in response to other CAs’ activity or bugs, nor to suggestions or prompting from any other member of the community. It was fully proactive on our part, as just one element of our ongoing effort to isolate, characterize, and mitigate sources of misissuance risk in our processes and systems.

my concern is less with the specifics of this incident, and more what it reveals about how Sectigo is approaching incident management in general.

We have invested a meaningful level of resource and rigor in analyzing the Mojibake issue and planning our response. We looked at technical controls and determined that they alone will not suffice, since they are probabilistic in nature rather than deterministic and can yield both false negatives and false positives. We therefore concluded that an additional layer of human review of all potential Mojibake was the only available baseline response, and so "Character Set Review" was born and announced. We responded to feedback by adding a third layer of human review (post-issuance, by our internal audit team). We had always intended to look again, after gathering more data, at how technical controls could add further robustness; and we're grateful that you pointed us towards FTFY because our first impression is that it should do a better job than any other Mojibake detector we had found previously.

So clearly we have some kind of disconnect here. Ryan, in what way does the process we have implemented and will continue to enhance differ from the suggestions you have made in the quoted sections above, and in what way does it suggest a poor approach to incident management?

Flags: needinfo?(ryan.sleevi)

There has been a good amount of discussion on this topic. Are there any other questions or comments from the community we can address?

(In reply to Tim Callan from comment #13)

In the quoted text above we stated that our internal audit team would review 100% of these instances. That is an example of “’defense in depth’ thinking, which tries to prevent a single error from cascading indefinitely, and to spot check processes.” It is, in fact, considerably more thorough than “periodic random sampling.”

I think there may be some confusion regarding the exact sequence of events; the description of what you plan to do seems to suggest something roughly akin to "post-issuance linting", while the description about process flows was trying to argue for "pre-issuance linting" by controls prior to issuance.

Rather than allowing fear of a very small potential error rate to prevent the implementation of a greatly improved system and process, we instead drove an aggressive schedule to launch a system that was a vast improvement over the then status quo and begin gathering the learning we would need to further improve.

And this is a risk/reward tradeoff that all CAs make: do they externalize risk to relying parties, while accepting the reward from the ease of issuance.

What's not clear from this incident report, and which can only be inferred, is an understanding of what risks Sectigo anticipates, and how it seems them being balanced. For example, I highlighted a risk that we see incredibly common in any single-person validation scenario: a mistake is made. Sectigo's approach to balancing that risk appears to be a post-issuance audit, which doesn't mitigate the risk, but at best, tries to detect it after the fact.

That means the risk of misissuance is seen as acceptable, but it's unclear the rationale for that. Equally, we've seen plenty of incidents where CAs, while well-intentioned, have failed to appropriate consider a portion of their certificates from such post-issuance evaluations (e.g. think of all the not-revoked internal server name certificates). A pre-issuance check can, to some extent, mitigate that, as can extensive descriptions about the systems and controls.

It's not to suggest that you aren't attempting to correct these issues, but rather, the philosophy appears to prioritize issuance and post-facto detection, rather than prevention. Comment #13 suggests that a cost/benefit analysis has been performed as to why that's the best approach, but that's not entirely clear what it is, and that doesn't really help inspire confidence in the CA. It's not proof of something bad happening - but it doesn't seem to reassure that the CA has thought through all the bad things that could happen, and has controls in place.

my concern is less with the specifics of this incident, and more what it reveals about how Sectigo is approaching incident management in general.

We have invested a meaningful level of resource and rigor in analyzing the Mojibake issue and planning our response. We looked at technical controls and determined that they alone will not suffice, since they are probabilistic in nature rather than deterministic and can yield both false negatives and false positives. We therefore concluded that an additional layer of human review of all potential Mojibake was the only available baseline response, and so "Character Set Review" was born and announced. We responded to feedback by adding a third layer of human review (post-issuance, by our internal audit team). We had always intended to look again, after gathering more data, at how technical controls could add further robustness; and we're grateful that you pointed us towards FTFY because our first impression is that it should do a better job than any other Mojibake detector we had found previously.

So clearly we have some kind of disconnect here. Ryan, in what way does the process we have implemented and will continue to enhance differ from the suggestions you have made in the quoted sections above, and in what way does it suggest a poor approach to incident management?

You mention the existing OV/EV review should catch most instances, but we also know from this bug that it doesn't, and hasn't, so it doesn't seem to necessarily be a control here. The primary control for prevention is Character Set Review, and the primary control for detection is the post-issuance audit. It's unquestionably good to have controls for both prevention and detection, and the question being raised here is whether these controls are reasonably sufficient to achieve that.

I tried to describe how we can learn from past CA incidents to reasonably want to question how effective Character Set Review will be at for prevention, and how those errors can propagate. The post-issuance audit, for detection, attempts to mitigate the propagation, but it's fundamentally not a prevention mechanism, but seems to be presented as such. Taken together, it appears Sectigo believes the combination are sufficient, relative to the costs that they would otherwise incur (e.g. validating every name, every time), but that calculus is not clear.

Do I think we know enough information to guarantee unambiguously a solution that works for 100% prevention? No, we may not yet know that. But that's the goal and expectation, and we'd hopefully see solutions that optimize to get as close to that as possible, and to be explicit when and why that's not achievable.

Flags: needinfo?(ryan.sleevi)

Here is an update for the first three weeks of Character Set Review, to give the community an idea for what it looks like.

In the first week we received orders for about 2150 certificates with characters outside the set of printable ASCII characters in one of the designated fields. All received character set reviews as they were all new to the system. The second week we received orders for again about 2150 certificates with these trigger characters in the designated fields, of which only 351 required character set review. Week three had another 2300 orders with trigger characters in the designated fields, of which 360 required character set review.

We performed our internal audit on all Character Set Reviews from the first three weeks and found no errors.

As discussed earlier in this bug, we view implementation of our Mojibake response as an iterative, learning process. At present, if we discover characters outside the set of printable ASCII characters in OU fields, we simply issue the certificate without the field. We originally chose this approach for a few reasons.

  1. It is clean and guaranteed to be error-free.
  2. We could programmatically implement it using software.
  3. OU fields are optional and will be phased out less than a year from now anyway.

By operating with this functionality in place we have discovered that it is disruptive to some customers who depend on the OU field for technical or process reasons. This has had the advantage of identifying some implementations that will need updating before next year’s deprecation of OU goes into effect. However, it has the disadvantage of creating this disruption well in advance of that actual deprecation.

With the benefit of further consideration, we believe our original approach was unnecessarily invasive. We are not the right ones to draw conclusions about the contents of a Subscriber’s OU field. So long as it does not contain an unverified brand name or domain name, what the OU field says is not really the CA’s business to judge. For all we know, the Subscriber has deliberately included what appears to be garbage text to our eyes and is expecting that specific text.

Therefore, we will change our approach. Rather than detecting and eliminating OU fields containing characters other than printable ASCII, we will simply ignore the contents of OU fields for purposes of Character Set Review. To be specific, we will:

• Cease to check OU fields for these characters
• Cease to remove OU fields with these characters
• Retain all processes and functionality related to OU fields from prior to our implementation of Character Set Review

As mentioned, we have used this as an opportunity to make these Subscribers understand that their dependence on OU fields must be eliminated by a date before September 2022. We will continue to socialize the upcoming OU deprecation with our customers.

We don’t have a specific target date for this change yet. We will share that date once available.

The update to treatment of OU fields described in comment 17 is scheduled for deployment in this coming weekend’s deployment window.

Our updated treatment of OU fields as described in comment 17 went live this weekend.

As mentioned earlier in this thread, finding the optimal procedure for preventing Mojibake characters from inclusion in certificates is an iterative, experience-based, learning process. At this point we have performed thousands of instances of Character Set Review with our internal audit confirming a 100% success rate.

We also have analyzed the content of the Subject information in Character Set Review. This real-world data set shows us that most instances of Character Set Review are driven by a set of words that repeat reasonably frequently. These usually are words descriptive of organizations, such as Académie; place names such as Jönköping; or common words like für. As we definitively understand these words to be legitimate text rather than Mojibake, there is no utility in any of them triggering Character Set Review.

Therefore we are creating functionality to support a Mojibake Exception List. Words in the Mojibake Exception List will not on their own trigger Character Set Review. This will apply strictly to distinct words in proposed Subject details that exactly match words in the Mojibake Exception List. Our definition of a distinct word in this case is a string of characters immediately adjacent to an ASCII space or punctuation mark on either side.

Note that the presence of a word from the Mojibake Exception List in an applicable Subject field will not in any way interfere with Character Set Review if it would have been triggered by other potential certificate contents. For example, if a certificate request contains the word für (as we plan to include this word in the Mojibake Exception List) and also the string örsä, the presence of für will in no way prevent Character Set Review.

We have used real-world Character Set Review contents to build an initial list of two hundred or so words containing extended-set characters to populate the Mojibake Exception List. These are all easily established as real words meeting the criteria given in the second paragraph of this post. Had we started with this functionality in place and these words in the exception list, we could have removed more than half the instances of Character Set Review undertaken. We believe that as we gather more Subject details, we can expand this list to remove more than 90% of Character Set Review instances, all of them unnecessary as the relevant fields will exclusively contain text known not to be Mojibake.

We are targeting release this year. As an aside, we are continuing our work on FTFY to implement a pre-issuance check using this tool. We intend to apply this check to all certificates, including those containing words from the Mojibake Exception List.

We continue to work on code updates for managing Mojibake, including the Exception List functionality and our FTFY pre-issuance check mentioned in comment 20. We are still targeting the Exception List functionality for release this year.

Our Exception List functionality as described in comment 20 is still underway. We’re considering some competing priorities as we settle on a delivery target. We will keep the community informed as we determine that date.

We have an official target date for our Exception List functionality of January 8. Ben, we’d like to ask for a Next Update on January 11.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2021-09-18 → [ca-compliance] Next update 2022-01-11

We deployed our Mojibake exception list functionality, as described in comment 20, last weekend.

Unless there are other remediation items or issues to discuss, I will close this case on Friday, 14-Jan-2022.

Flags: needinfo?(bwilson)
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2022-01-11 → [ca-compliance] Next update 2022-01-15
Flags: needinfo?(bwilson)

Ben,

We still have a project to build in a pre-issuance checker based on FTFY. We have been researching it and have discovered that it is a viable tool but that it’s not right as a Mojibake pre-issuance checker without some additional work. We feel this information is potentially valuable to the community and would like to keep this bug open for the time being.

Thank you.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] Next update 2022-01-15 → [ca-compliance] Next update 2022-02-15

On January 13, in apparent response to our work on Mojibake, we received a report of 21 certificates in which the Unicode Replacement Character (U+FFFD) appears, often displayed as a question mark in a black rhombus. These certificates are no longer active.

Our previous investigation of certificates with Mojibake did not find these certificates and a test run of certificates through FTFY did not flag these as errors. This is not surprising, as the main goal of FTFY is to take in bad Unicode and output good Unicode. As there is no clear replacement for U+FFFD, FTFY cannot accomplish that task in this case and does nothing.
In our previous efforts in this bug, we implemented a Character Set Review process. We issued all certificates reported in this batch prior to Character Set Review being enabled in production, and unlike FTFY, Character Set Review is able catch these errors and prevent any additional certificates from being issued with this error.

Timeline

January 13, 2022 - 14:22 UTC
Sectigo receives a report from another CA reporting that Unicode Replacement Character (U+FFFD) was included in Sectigo certificates.

January 13, 2022 - 14:57 UTC
We send a reply to the reporter that we are investigating.

January 13, 2022 - 17:00 UTC
Our internal investigation is concluded, finding a total of 21 certificates with this replacement character.

January 13, 2022 - 17:29 UTC
We initiate a revocation event scheduled for January 17, 2022 at 16:00 UTC.

January 17, 2022 - 16:22 UTC
Revocation complete. A total of 19 certificates are revoked. Two of the discovered certificates expired on January 14 and January 15 and are not reported here as revoked.

January 17, 2022 - 16:46 UTC
We create a ticket to block issuance of certificates containing Unicode Replacement Character (U+FFFD) in any subject field and to disallow approval of this character in our Mojibake omission list.

Affected certificates

Serial Number Certificate Precertificate
735CC010A9D232DD04B51F470DAC075B Certificate Precertificate
35CFB3E8D184698EC36142132C6575F3 Certificate Precertificate
0080F81A30391A6B99DB968D5C723E0608 Certificate Precertificate
00EEDC3D7EF17CC8C7FA0A973A1EB365CA Certificate Precertificate
407B915FE14EB8113CA7BFBA17FAE573 Certificate Precertificate
30BF60BFA1645518EE9F9D1317919BCA Certificate Precertificate
14E2BBCA8873A7B2F5BDFECF99FFF462 Certificate Precertificate
2A1777C87FA6D559DF0A98DFD3F494BD Certificate Precertificate
00E3BCC9DE027B2705F09B9967CD8807CA Certificate Precertificate
2748D840DF8C5B2204AC4726D3676E61 Certificate Precertificate
00BE14E9729E4B45E2D1E0AC58BB3475EA Certificate Precertificate
0092D399E71DCEE06567AABAD4AC7E285A Certificate Precertificate
00E1AD4464B3E33CEB5311A87FFEF73E0C Certificate Precertificate
00906C2E3E3D42E5D66802A8E21B41C65F Certificate Precertificate
00CE82A381F0092091B9B8B3C24E61C021 Certificate Precertificate
076C8147F91F390A252B5865A027E414 Certificate Precertificate
401E191C3F84EDF5F746159EA24E5A76 Certificate Precertificate
0DB6ACD3B0C360CCB6C020F45498EE65 Certificate Precertificate
2D31B75D086EC71DA56A9344FACF1C45 Certificate Precertificate

In reviewing our ongoing response to Mojibake we have come to the realization that we did not report a batch of certificates we revoked in November.

20 certificates issued between Nov 28, 2019, and Sept 9, 2021, containing non-printable ASCII characters in Subject fields. We discovered these certificates by internal review on November 3, 2021, 16:22 UTC and subsequently revoked them on November 8, 2021, 15:14 UTC.

Serial Number Certificate Precertificate
00BCC283627D57444CABFCFC85D582222A Certificate Precertificate
00EE9C1952D59487E294E896EF8278572F Certificate Precertificate
00BB74F3CA2D0B83F168780E945FF8E37A Certificate Precertificate
00B64EB13AE7DEC014ABC07C1D02B9C0E9 Certificate Precertificate
5B7366311DF7EAFA4614C2A2736F16F8 Certificate Precertificate
1112066A15EA3442D1F470162BEF1F18 Certificate Precertificate
0088E90F58593A594B00F0281AF90FEBC7 Certificate Precertificate
00B1EF4D95D61A1CD25AA532100D9950B7 Certificate Precertificate
00F96A1C15BCDC86448A71289719DA3C15 Certificate Precertificate
00FA5D347B88391BEF180DE8D3D51633C7 Certificate Precertificate
736C5F84301481B252F8BB7CC2533A97 Certificate Precertificate
00D6AA6453170DC327905461408777A378 Certificate Precertificate
11586B1EE7C7AFB2E1766309E08A1E5D Certificate Precertificate
3EBC97EA9561A0A95F5BBEE3A7D388CB Certificate Precertificate
00EB407F86391F327EC0BB62C0C11ECB72 Certificate Precertificate
00C735842A11DFC06C71BD60BA9C249447 Certificate Precertificate
00BB384FAA4E92410A3713363CB068BA98 Certificate Precertificate
009583406D03BCD102468593C0ACEAB17F Certificate Precertificate
5DFFDB92794F34E7934E230025222651 Certificate Precertificate
00EE4B21ED6F23A8078D8AA51D56E03E8A Certificate Precertificate

In the past few weeks we have populated our Omission List with thousands of discrete words that should not by themselves trigger Character Set Review, each of which has been reviewed by two separate validation agents prior to inclusion. The consequence of populating this list is that our incidence of Character Set Review has been reduced by more than 90%.

This last weekend we deployed an automatic pre-issuance check for a set of Unicode characters to be excluded from certificates. These are characters for which we see no place in the common text that occurs in certificates. For example, we can’t foresee the circumstance where the reversed pilcrow sign would be appropriate in a certificate. The presence of one of these characters in any subject field will block issuance and send the order back to validation for review. We can update this table in the future if we discover additional characters to include or that an included character should be allowed.

The list of blocked characters at release time is available in attachment 9264009 [details].

This last weekend we deployed an automatic pre-issuance check for Mojibake characters based on the FTFY tool. As previously mentioned in comment 26, the substitutions FTFY makes are not all suitable for a pre-issuance check as is. This is because FTFY makes substitutions where the original character is acceptable for inclusion in a certificate. That is not entirely surprising as we are using FTFY for a somewhat different purpose than its original intention. As an example, FTFY on its own substitutes a left or right double quotation mark (\u201C or \u201D, respectively) for a generic double quotation mark. There is nothing wrong with including a left or right double quotation mark in a certificate as opposed to a generic double quotation mark, and therefore it would be inappropriate to block issuance because this character is included.

Our implementation includes a false positive table to prevent these characters from blocking certificate issuance. We can add or remove characters to this table if we determine that such a change is needed. This table’s contents at release time are available in attachment 9264189 [details].

There was debate earlier in this thread about the effectiveness of Sectigo’s approach to mitigating the Mojibake problem. Now that we have completed our mitigation of Mojibake and other identified content that we believe should not be included in certificates, it is worthwhile to look back on how Sectigo’s chosen approach facilitated our arrival at an effective automated protection from disallowed strings of Mojibake and UTF-8 substitution characters.

Beginning with Character Set Review had the benefit of allowing us to implement a solution quickly to greatly mitigate the Mojibake problem. As “finding the optimal procedure for preventing Mojibake characters from inclusion in certificates is an iterative, experience-based, learning process,” it also had the advantage of advancing our understanding of the problem, how it manifested itself in the real world, and the effectiveness of our actions in response. We adjusted our responses based on lessons learned from our operation of Character Set Review, including removing OU fields from the scope of monitoring (comment 17) and adding the Mojibake Exception List (comment 20).

The most important learning for us, however, was the data set of actual suspect strings that we were able to create through Character Set Review. As stated in comment 6,

for the first six months after deployment we have committed to internal audit review for 100% of the instances of Character Set Review that occur… With six months of data, we will be much better positioned to determine how the process is working and if changes are appropriate.

And then again in comment 8 we said,

This exercise will also help us build a real-world sample set of non-printable ASCII from certificate requests.

This real-world data set turned out to be essential to our effort to design and implement a reliable and broad-reaching response to the problem described in the first paragraph of this post. It became our test bed for assessing FTFY as a misissuance detection tool. Our early tests of FTFY showed a high false positive rate for blocking issuance, owing to a set of characters that FTFY substituted but that were perfectly appropriate for inclusion in Subject fields without creating any chance for misunderstanding or misinterpretation. These characters became the starting point for our FTFY Exception List as described in comment 33.

Operating Character Set Review as a process also allowed us to identify a set of characters that FTFY did not detect but we nevertheless wanted to eliminate from certificates. These became the contents of the Unicode Blocklist described in comment 31.

With these automated systems now in place, we can audit their behavior to ensure they are performing as expected. Once we are assured they are, we will be able to deprecate Character Set Review in favor of this entirely automated process, eliminating the possibility for human error from the process entirely.

After deploying our FTFY checker, we ran a query based on FTFY for certificates we may have missed in our earlier querying. We discovered one certificate with Mojibake in one of its Subject fields. This certificate has been revoked.

Serial Number Certificate Precertificate
2107E4C96DE4FD4FB96E5406BF9CFA78 Certificate Precertificate

We were surprised to see this certificate as it was issued in November 2021 and would have been expected to show up in our internal audit of Character Set Review. We have looked into the matter and discovered a “same-day reissue” error in our script to pull data for internal audit of Character Set Review results. This script missed any character set review for certificates that were reissued the same day as initial issuance, as was the case with this certificate. This error is fixed in our script, although our plan is to deprecate human-based Mojibake monitoring in the near future in favor of our new, automated method.

Timeline:
February 16, 2022, 19:58 UTC – Query began of the certificate base for possible misissued certificates with Mojibake
February 17, 2022, 17:31 UTC - Query completed for possible misissuance and report sent for review
February 17, 2022, 17:42 UTC – Review of possible misissued certs completed; one certificate confirmed as misissued
February 21, 2022, 17:27 UTC – Certificate revoked

This concludes our investigation and remediation for this issue.

Comment #34 says:

This script missed any character set review for certificates that were reissued the same day as initial issuance, as was the case with this certificate

Can you share more detail about how this happens? This seems like such an edge case of a bug (that is, the situation doesn't seem an edge case, the bug does) that it's useful to understand how such a mistake could creep in.

Comment #31 says:

The presence of one of these characters in any subject field will block issuance and send the order back to validation for review.

Can you share what precisely the policies are for validation's review?

Flags: needinfo?(tim.callan)

(In reply to Ryan Sleevi from comment #35)
This post is to acknowledge this question and let you know we’re working on a response.

Flags: needinfo?(tim.callan)

(In reply to Ryan Sleevi from comment #35)

This script missed any character set review for certificates that were reissued the same day as initial issuance, as was the case with this certificate

Can you share more detail about how this happens? This seems like such an edge case of a bug (that is, the situation doesn't seem an edge case, the bug does) that it's useful to understand how such a mistake could creep in.

The script is looking for new certificates since the last time it ran. By the logic of the script a replacement certificate is not treated as a new certificate. Because the original certificate had been replaced by the time the script ran, the script didn’t pick up that certificate either. This problem could only occur if a certificate was both initially issued and replaced without our script running between the two events. As the script runs daily, this is a rare event. We know from our scan of our active certificate base as described in comment 34 that this error occurred exactly once.

The presence of one of these [] characters in any subject field will block issuance and send the order back to validation for review.

Can you share what precisely the policies are for validation's review?

While we believe none of these characters should appear legitimately in certificate Subject fields, this is new ground for us and we are open to the possibility that we are wrong on one or more of them. If review reveals that the character in question actually appears in the documentation we collect, then we will need to reevaluate that character’s inclusion on this list.

Therefore the review will be to compare the field in question to the content of the relevant documentation. If the suspect character is present in the documentation, we will evaluate if our blocklist needs to be adjusted and come to a disposition on whether or not we can issue the certificate as is. If it is not present in the documentation, we will replace the blocking field with correct information that matches the documentation.

We believe we have thoroughly investigated and discussed this matter. We have an automated solution in place to block misissuance due to Mojibake or other disallowed characters. We will continue to monitor and refine our systems based on real-world results. We have addressed all questions we have received on this matter.

Ben, we believe we are ready to close this bug. If there are no additional questions, can we close it?

Flags: needinfo?(bwilson)

I'll schedule this for closure on Friday, 18-Mar-2022. Thanks. Ben

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED

As the next stage in our ongoing development of our systems and process to monitor for and prevent incorrect issuance of certificates with Mojibake or other invalid characters, we have moved entirely to automated checking based on FTFY. We intend to monitor results for some time to ensure we are happy with them.

Product: NSS → CA Program
Whiteboard: [ca-compliance] Next update 2022-02-15 → [ca-compliance] [ov-misissuance]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: