Closed Bug 1598390 Opened 5 years ago Closed 4 years ago

Microsoft PKI Services: Null Character Bug and Microsoft Root CAs

Categories

(CA Program :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jcooper, Assigned: julio.montano)

References

Details

(Whiteboard: [ca-compliance] [ca-misissuance])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18363

Steps to reproduce:

  1. How your CA first became aware of the problem

This problem was noticed while running x509lint tests. The problem being noted is that the following error is returned after parsing the certs: “URL contains a null character”. We are proactively reporting this now since it did not receive discussion, comments, nor questions during the Mozilla application process.

The error is caused by a null character added to the end of an URL due to a bug in Windows Server 2012 R2 Certificate Services. The error is not known to cause any technical problems and is only noticeable during a verbose examination. It does not impact the usefulness of encoded certificates, so Microsoft considered this a cosmetic bug and as such no update was issued to resolve it.

  1. A timeline of actions your CA took in response

No actions were taken since we view this error as non-problematic. Shortly after creation of the affected certs Microsoft PKI Services began using a newer version of Windows Server, which contains a fix for the bug that causes this error.

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem.

We are not issuing new certificates with this problem. All certificates are now issued using newer versions of Windows Server.

  1. A summary of the problematic certs.

The four root CAs listed below show this error when parsed with a lint tool.

  1. The complete certificate data for the problematic certificates.

https://crt.sh/?id=988218851
https://crt.sh/?id=988140328
https://crt.sh/?id=988215004
https://crt.sh/?id=988137612

  1. Explanation about how the mistakes were made or bugs introduced, and how they avoided detection until now.

The bug appears to have existed since the release of Windows Server 2012 R2. The bug avoided detection by Microsoft PKI Services because we never saw any indications of technical problems or compatibility issues associated with our certificates.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Going forward we are only using newer versions of Windows Server Certificate Services that do not have this bug.

Assignee: wthayer → jcooper
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance]
Blocks: 1582254

(In reply to Jason Cooper from comment #0)

  1. The complete certificate data for the problematic certificates.

https://crt.sh/?id=988218851
https://crt.sh/?id=988140328
https://crt.sh/?id=988215004
https://crt.sh/?id=988137612

Peter Gutman's list of problematic certs shows several other certs with similar problems in these hierarchies:
https://groups.google.com/d/msg/mozilla.dev.security.policy/ui5XV7wbalw/8kkfpBDbBAAJ

It is the CA's responsibility to review all of the certs in their CA hierarchies when problems are reported, and to identify the full scope of the problem.

Jason: Thanks for filing the incident report as you became aware.

While I think it's useful to understand the root cause, in terms of the software you're using, I think it's also useful to understand the root cause in terms of why this evaded detection until publicly reported, as well as the analysis that Microsoft has done of the relevant standards. For example, if the view is that the linters are invalid, what steps are being taken to correct, if any? In the view is that Microsoft could have caught this with linting, what steps are being taken to incorporate pre-issuance and post-issuance linting?

For example, the particular field with the literal NUL (\0) is the CPS URI. RFC 5280, Section 4.2.1.4 states:

   The CPS Pointer qualifier contains a pointer to a Certification
   Practice Statement (CPS) published by the CA.  The pointer is in the
   form of a URI.  Processing requirements for this qualifier are a
   local matter.  No action is mandated by this specification regardless
   of the criticality value asserted for the extension.

The URI syntax is referenced elsewhere in RFC 5280, and refers to RFC 3986

Nominally, the NUL character is occurring in the path segment, which follows the ABNF form *pchar, for which pchar is

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

Continuing that specification, however, we end up with:

   pct-encoded   = "%" HEXDIG HEXDIG

   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

This is all from RFC 3986, Section 3.3.

So while it may be valid for an IA5String to contain a literal NUL, it does not appear valid for the CPS URI to contain a NUL, and this would appear to be a violation of RFC 5280. So I'm not sure we can or should be dismissive about this.

Flags: needinfo?(jcooper)

Comment #7 in bug 1582254 asks if Mozilla should finish adding these roots to our root store.

Jason:

  • it appears that this problem also affects Microsoft subordinate CAs? (e.g. https://crt.sh/?id=1197076917&opt=cablint). If so, why were these not reported, and what is Microsoft doing to ensure that all occurrences of this issue have been identified?
  • the timeline provided in your incident report does not meet the requirements, specifically "A timeline is a date-and-time-stamped sequence of all relevant events". Please review Mozilla's incident reporting requirements and update your report accordingly.
  • In the root inclusion request, Microsoft made the statement "[MS] All certificates have been tested and are compliant". How was this issue missed?
  • It's not clear from your incident report if this problem was detected by Microsoft before or after the root signing ceremony?
  • "it was a bug and now it's fixed" is not an acceptable remediation. We are looking for an analysis of the root cause and steps that can prevent similar failures in the future.

Hi all, below please find my answers to the various questions. As always, please let me know if you have additional questions.

To Kathleen’s question.

Correct, I was focused on the roots and didn’t list the intermediate CAs also affected by this issue. As it relates to this discussion, here is the complete list.

https://crt.sh/?id=988218851
https://crt.sh/?id=988140328
https://crt.sh/?id=988215004
https://crt.sh/?id=988137612
https://crt.sh/?id=1197076917
https://crt.sh/?id=1197067049
https://crt.sh/?id=1197079848
https://crt.sh/?id=1197075787

Please note that I am not listing the certificates that are designed to only be trusted on Microsoft platforms or any private PKI that is affected by this issue.

To Ryan’s comments and questions.

We have examined the relevant standards (RFC 5280 and RFC 3986), and while 5280 references to 3986 might be specific to URIs in Subject Alternative Name extensions, I don’t think that attempting to interpret to that level of granularity is useful and it would not serve as an excuse for the error or bug that caused the error. I think it is safe to say that if we had pre-issuance linting tools at the time we created these CAs that we would have likely caught the error. We have built up our pre and post issuance inspection capabilities quite a bit in the last several months and will continue to invest in this area.

To Wayne’s comments and questions.

It appears that this problem also affects Microsoft subordinate CAs? (e.g. https://crt.sh/?id=1197076917&opt=cablint). If so, why were these not reported, and what is Microsoft doing to ensure that all occurrences of this issue have been identified?

Correct with respect to some subordinates (please see above list). To identify all occurrences we are using our asset and certificate lifecycle management systems to exam the hierarchies we manage that were created using the affected version of Windows Server. We have examined all other intermediates relevant to this discussion, and with the exception of the four noted above, they were created using a version of Windows Server that is not affected by this bug. Additionally, we have increased our investment in pre and post issuance inspection to assure we catch potential future issues.

The timeline provided in your incident report does not meet the requirements, specifically "A timeline is a date-and-time-stamped sequence of all relevant events". Please review Mozilla's incident reporting requirements and update your report accordingly.

Regarding the timeline of relevant events, here is the sequence of events that led up to this bug being opened.

November 4, 2019: A private report was sent to Microsoft notifying us of the null character issue.

November 4-6, 2019: Members of the Microsoft PKI Services team investigated and were able to confirm the issue by running lint tests and through close examination of the encoded certs. After reproduction tests, we concluded that it was not caused by steps in our CA creation ceremonies but was likely a bug in Windows Server 2012 R2 Certificate Services.

During this time we also checked internal discussion records as well as the comments made throughout the course of Bug 1448093. We found no indications that the error was known by current members of the Microsoft PKI Services team. We also saw that the error did not receive discussion during the Mozilla inclusion application process.

November 7, 2019: We reached out to Mozilla representatives letting them know this issue had come to our attention. It was agreed that we would open an incident bug.

November 7, 2019 to November 21, 2019: Microsoft PKI Services team members worked closely with Microsoft employees responsible for the Active Directory Certificate Services (ADCS) features in Windows Server. We felt it was important to understand the bug and any perceived or real risks associated with the null character in affected certs. The conclusion was that this bug does not introduce security risks and that it was considered a cosmetic bug. No updates or other fixes were issued to address the bug in Windows Server 2012 R2. The bug was fixed with the introduction of Windows Server 2016.

November 21, 2019: This Bug (1598390) was opened.

In the root inclusion request, Microsoft made the statement "[MS] All certificates have been tested and are compliant". How was this issue missed?

That is an excellent question and one we have asked ourselves. Unfortunately, I would have to speculate as to that specific comment. However, I think it is fair to say we should have had better tooling/processes to inspect our certificates and as I noted previously, we have made and will continue to make investments on this front.

It's not clear from your incident report if this problem was detected by Microsoft before or after the root signing ceremony?

I hope I cleared that up with the sequence of events above. Current members of our team were not aware of the problem at the time of the root creation ceremony. We remained unaware of the problem until earlier this month as noted above. As soon as we were aware, we started investigating in order to understand the issue so that we could open this bug with as much verified information as possible.

"It was a bug and now it's fixed" is not an acceptable remediation. We are looking for an analysis of the root cause and steps that can prevent similar failures in the future.

Please let me know if this post doesn’t sufficiently address root cause and our plans for future prevention. I’m happy to answer any other specific questions or concerns.

Flags: needinfo?(jcooper)

Jason: Thanks for the update.

With respect to the following:

That is an excellent question and one we have asked ourselves. Unfortunately, I would have to speculate as to that specific comment. However, I think it is fair to say we should have had better tooling/processes to inspect our certificates and as I noted previously, we have made and will continue to make investments on this front.

It doesn't really give any clear or actionable next steps. Without wanting to sound too negative, this seems to be a bit of a "shrug and whoops". I think it's reasonable to try to understand what sort of steps are being taken care, so that future CA generation ceremonies, and signing, are going to adhere to the BRs.

I think one understandable element of concern is trying to understand ADCS's compliance with the BRs and RFC 5280. This isn't to pick on ADCS - we've seen the same with other COTS solutions like EJBCA, and with home-grown solutions - but trying to understand what steps have been taken by CAs to ensure that the certificates they create are going to be compliant.

One way to address this is by describing a bit more about the generation ceremonies, how they're scripted and reviewed, and what sort of compliance controls exist, both technical and procedural. Understanding those controls, where they might have failed or been deficient, and how they're being improved going forward, is key.

Further, given that these certificates are questionable, past precedent has been to request the CA create properly-encoded versions that can be used (e.g. sharing the same subject and SPKI and SKI, but with the encoding issues fixed). For example, invalid extensions or serial numbers have been corrected in this way, to ensure that, going forward, the Mozilla Root Store contains RFC5280-compliant CAs, and actively used ICAs are also RFC5280 compliant. Are there problems with this approach that would make it challenging? Understanding what steps can be taken here is key.

Flags: needinfo?(jcooper)

Hi Ryan,

Apologies for the delay in responding to this. Please know that we are taking this matter seriously and are planning to do as you suggest by creating properly encoded versions of the CAs in question. Additionally, we will be revoking the four intermediates noted above. I will report back when that work is complete, including updates to CCADB. Please let us know if you have any concerns with that course of action.

Regarding the generation ceremonies, the process starts with templatized step-by-step ceremony scripts that are customized for each ceremony. Each ceremony is reviewed by multiple trusted role personnel and approved or rejected in CA management systems for record keeping and audit purposes. The step by step process included a number of technical and procedural compliance controls, signed off by multiple trusted role personnel at each step. As I mentioned in my previous post, the failure here was a step to inspect the output of the ceremony (encoded cert) for an error like the one associated with this bug. We did not have tools in our offline suite to perform this inspection at the time of creation, and as a result we missed the null character introduced by that particular version of Windows. We have now added steps and tooling to this overall process to make sure we don't release a non-compliant cert and we are building additional tools as I write this.

Please let us know if you have additional questions. I will update this thread when we have completed the work described in the first paragraph of this post.

Flags: needinfo?(jcooper)

Mohan, Julio: My understanding is Jason's no longer the primary POC. Do you have an update to provide here for Microsoft? I also couldn't find a Bugzilla account for John, so wasn't able to CC them.

Flags: needinfo?(mohanr)
Flags: needinfo?(julio.montano)
Summary: Incident Report: Null Character Bug and Microsoft Root CAs → Microsoft: Null Character Bug and Microsoft Root CAs

Hello Ryan,

Please see below the timeline of events since Jason's update.

  1. On 12/18/19 we reissued the all of the roots with the same key pair. A certificate linting step was also added to our CA ceremony template to ensure that these types of errors are found going forward prior to releasing the certificates. The re-issued certificates can be found here:

http://www.microsoft.com/pkiops/certs/Microsoft ECC Root Certificate Authority 2017.crt
http://www.microsoft.com/pkiops/certs/Microsoft RSA Root Certificate Authority 2017.crt
http://www.microsoft.com/pkiops/certs/Microsoft EV ECC Root Certificate Authority 2017.crt
http://www.microsoft.com/pkiops/certs/Microsoft EV RSA Root Certificate Authority 2017.crt

  1. We closed our WebTrust BR audit on 12/31/20 containing these roots and expect an opinion from BDO by the end of the month of March 2020.
  2. On 1/17/20 we issued new intermediate CAs from the new roots. The certificates can be found here:

http://www.microsoft.com/pkiops/certs/Microsoft Azure ECC TLS Issuing CA 01.crt
http://www.microsoft.com/pkiops/certs/Microsoft Azure ECC TLS Issuing CA 02.crt
http://www.microsoft.com/pkiops/certs/Microsoft Azure ECC TLS Issuing CA 05.crt
http://www.microsoft.com/pkiops/certs/Microsoft Azure ECC TLS Issuing CA 06.crt

http://www.microsoft.com/pkiops/certs/Microsoft Azure TLS Issuing CA 01.crt
http://www.microsoft.com/pkiops/certs/Microsoft Azure TLS Issuing CA 02.crt
http://www.microsoft.com/pkiops/certs/Microsoft Azure TLS Issuing CA 05.crt
http://www.microsoft.com/pkiops/certs/Microsoft Azure TLS Issuing CA 06.crt

  1. On 1/22/20 PST all of the previous issuing CAs issued from the four Roots containing the Null character bug that were not previously revoked were revoked.

https://www.microsoft.com/pkiops/certs/Microsoft%20Azure%20ECC%20TLS%20Issuing%20CA%2001.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20Azure%20ECC%20TLS%20Issuing%20CA%2002.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20Azure%20ECC%20TLS%20Issuing%20CA%2005.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20Azure%20ECC%20TLS%20Issuing%20CA%2006.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20ECC%20Issuing%20CA%2001.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20ECC%20Issuing%20CA%2002.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20ECC%20Issuing%20CA%2005.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20ECC%20Issuing%20CA%2006.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20ECC%20EV%20Issuing%20CA%2001.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20ECC%20EV%20Issuing%20CA%2002.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20ECC%20EV%20Issuing%20CA%2005.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20ECC%20EV%20Issuing%20CA%2006.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20EV%20Issuing%20CA%2001.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20EV%20Issuing%20CA%2002.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20EV%20Issuing%20CA%2005.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20EV%20Issuing%20CA%2006.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20Issuing%20CA%2001.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20Issuing%20CA%2002.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20Issuing%20CA%2003.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20Issuing%20CA%2005.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20Issuing%20CA%2006.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20TLS%20Issuing%20CA%2007.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20Azure%20TLS%20Issuing%20CA%2001.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20Azure%20TLS%20Issuing%20CA%2002.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20Azure%20TLS%20Issuing%20CA%2005.crt
https://www.microsoft.com/pkiops/certs/Microsoft%20Azure%20TLS%20Issuing%20CA%2006.crt

  1. On 3/6 Two of the re-issued Roots were included in the monthly Microsoft Trusted Root program update. Karina Sirota will update CCADB with these Roots by the end of the week.

http://www.microsoft.com/pkiops/certs/Microsoft ECC Root Certificate Authority 2017.crt
http://www.microsoft.com/pkiops/certs/Microsoft RSA Root Certificate Authority 2017.crt

  1. The remaining two re-issued Roots will be added to the Microsoft Trusted Root program and the CCADB record will be updated by the end March

http://www.microsoft.com/pkiops/certs/Microsoft%20EV%20ECC%20Root%20Certificate%20Authority%202017.crt
http://www.microsoft.com/pkiops/certs/Microsoft%20EV%20RSA%20Root%20Certificate%20Authority%202017.crt

  1. The four previous roots with the Null character URI bug will be removed from the Microsoft Trusted Root program by the end of April.

Please let John Mason and myself know if you have additional questions. We will update this thread as the work described in steps #'s 6 & 7 are completed.

Flags: needinfo?(julio.montano)

Hello Ryan, based on the previous post from Julio Montano, I believe all the info has been provided. if you concur, please clear the need info flag and the disposition of the bug accordingly.

Flags: needinfo?(mohanr)
Assignee: jcooper → julio.montano
Flags: needinfo?(wthayer)

Given that steps 6 and 7 in comment #8 are not relevant to Mozilla, it appears that remediation is complete.

Julio, before I close this bug, please update bug #1448093 and bug #1582254 with the new replacement roots that Mozilla should include.

Flags: needinfo?(wthayer) → needinfo?(julio.montano)

Hello Wayne,

This comments have been added to the bugs that you mentioned. I did not attach them to the bugs as they are similarly named to the previous versions of the roots. Finally, our latest audit report that covers these roots will be uploaded to CCADB shortly.

Flags: needinfo?(julio.montano)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ca-misissuance]
Summary: Microsoft: Null Character Bug and Microsoft Root CAs → Microsoft PKI Services: Null Character Bug and Microsoft Root CAs
You need to log in before you can comment on or make changes to this bug.