Closed Bug 280839 Opened 20 years ago Closed 17 years ago

Match unicode cert CNs to ACE (punycode) host names

Categories

(NSS :: Libraries, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: roc, Assigned: wtc)

References

Details

Attachments

(1 file)

See https://bugzilla.mozilla.org/show_bug.cgi?id=279099#c8 part 1.

Should the domain name in an SSL cert be the punycode-encoded domain name
(that's what we expect), or raw Unicode (that's what some other browsers are
expecting)?
Flags: blocking1.7.6+
Flags: blocking-aviary1.0.1+
I'm not sure whether to answer this question here or in bug 279099.

The answer appears to be: it depends on where the server's name is encoded 
in the certificate.  Remember that hostnames in CommonNames are now 
deprecated.  There is another way, one that is RFC standard compliant, 
to include hostnames in certs, and mozilla supports it.  

The standard way would require punycode for names that include non-ASCII
characters, because it encodes as IA5String.  The old legacy way (commonName)
allows any unicode string (including UTF8, UCS2/UTF16, and UCS4), and so
presumably would NOT use punycode.

According to the informational RFC 2818 there are two places in an https 
server cert where the server's dnsName can be encoded.  They are:
a) in the optional list of "subject alternative names", encoded as a "dnsName", or
b) in the CommonName attribute of the cert's subjectName (the legacy method).

According to RFC 3280, a "dNSName" may only be encoded as an IA5String. 
IA5String is a subset of ASCII.  Therefore, a dNSName in the subjectAltName
extension cannot contain unicode characters (except those that are part 
of the IA5String subset of Unicode).  If a hostname that contained unicode
characters was going to be encoded as a standards compliant dNSName in a
subjectAltName, it would have to be in punycode.

According to RFC 3280, in certs issued prior to Jan 1, 2004, the CommonName
attribute may be encoded as any of these types:
      teletexString     TeletexString   (SIZE (1..ub-common-name)),
      printableString   PrintableString (SIZE (1..ub-common-name)),
      universalString   UniversalString (SIZE (1..ub-common-name)),
      utf8String        UTF8String      (SIZE (1..ub-common-name)),
      bmpString         BMPString       (SIZE (1..ub-common-name)) 
After December 31, 2003, they must be encoded as UTF8String, although
we think some CAs still do it the old way.

The CommonName method was put into use as a defacto industry standard 
before the "subject alternative name" extension was defined and standardized, 
and is still widely used.  But now that the subjectAltName extension is
standardized, use of the CommonName for the server's dnsname is *deprecated*. 

NSS allows the application that uses it to perform its own cert validation,
and NSS provides its own functions that an application may use for this
purpose.  MOST applications that use NSS also use NSS's cert chain validation
and host-name matching functions, and do not implement their own.  

NSS's function for cert name matching accepts as input arguments a cert
object and a UTF8 string that is the name that the application expects to 
find in the cert.  NSS attempts to match that application-supplied string
against the names in the cert.  First it looks for dnsNames in the cert's
subjectAltName extension and compares against them.  If no dnsNames are 
found in the cert's subjectAltName (or if the cert has no subjectAltName)
it checks against the CommonName attribute in the cert's subjectName.
If the CommonName attribute is encoded as teletexstring, bmpstring or
universalstring (which are ISO-8859-1, UCS2 or UCS4), it converts the 
CommonName to UTF8 before comparing.  (printableString is a subset of UTF8,
and needs no conversion before being compared to a UTF8 string.)

In some sense, the question being asked here, about what *should* be 
expected, is one for the standards bodies to answer.  But the CA industry
today is not conforming very tightly to the standards, and I expect we
will have to wait and see what the CAs really do.  

If we find, as I expect, that punycode will be used in subjectAltNames, 
and not in subject commonNames, then we may need to change the NSS 
function for cert name checking to expect the application to provide the 
real (non-punycode) name in UTF8 form, and be able to convert it to 
punycode before comparing against dNSNames in subjectAltNames,
> If we find, as I expect, that punycode will be used in subjectAltNames, 
> and not in subject commonNames, then we may need to change the NSS 
> function for cert name checking to expect the application to provide the 
> real (non-punycode) name in UTF8 form, and be able to convert it to 
> punycode before comparing against dNSNames in subjectAltNames,

Sounds reasonable. Can you do this?
My proposal (converting UTF8 to punycode before comparing dNSNames in
subject alt names) is feasible.  It requires a c-language callable 
function to do that conversion, which NSS presently does not have.

Perhaps it would be expedient to do this conversion via a callback,
just as NSS once used callbacks to convert between various flavors of 
unicode.  That way, we could potentially re-use mozilla's existing 
punycode conversion code, through some c->c++ wrapper.

My work priorities will not permit me to implement this this year.
Perhaps one of my NSS colleages can do this.  
> Perhaps it would be expedient to do this conversion via a callback,

Yeah, that sounds like the right approach.  There are several steps involved in
UTF-8 conversion to punycode.  One step involves NFKC normalization, which
presently involves invoking code in mozilla/intl via an XPCOM interface.

Necko's IDN support is provided via nsIIDNService.
(In reply to comment #4)
> Necko's IDN support is provided via nsIIDNService.

So, I think NSS would provide a function to register a punycode conversion
callback, and the cert name checking function would call that function, if 
it is non-NULL, otherwise would work as it does now.  

Presumably PSM would contain the c-landuage callable c++ callback function
that uses nsIIDNService, and would register that function with NSS.  
Sigh,

Even though we have callbacks to to utf conversions, NSS also supplies default
functions as well. In this case maybe we just fail the punycode compare? This
has a bigger impact on command tools then applications typically.

bob
Here's one interesting hack to work with safari:

http://bob.pythonmac.org/archives/2005/02/07/idn-spoofing-defense-for-safari/

Also, you may wish to review the following RFCs:

3454
3490
3491

Cheers, 
Eric
Here is a relevant presentation by someone from thawte:

http://www.icann.org/presentations/valentin-idn-ct-01dec04.pdf
This whole situation reveals that mozilla is passing the punycode version
of the hostname to NSS, rather than passing the UTF8 version of the
hostname.  I believe the correct thing to do is to pass the UTF8 hostname
string to NSS.  That change (to PSM, I believe) would be an immediate
improvement to this issue, I think.
> I believe the correct thing to do is to pass the UTF8 hostname
> string to NSS.

Hmm... Necko passes PSM the hostname via nsISocketProvider.  That API requires
an ASCII hostname.  So, I suppose we could make PSM convert from punycode to
UTF-8 before communicating with NSS.  nsIIDNService provides the methods IsACE()
and ConvertACEtoUTF8(), which could be used.
In light of Darin's comment 10, let it be said that there are (at least) 
two ways we could approach this:

a) enhance NSS's existing cert name checking function.  
Leave it defined as it is, where the name received from the application 
for comparision with the cert is defined to be UTF8, and enhance that 
function to make a punycode-encoded copy.  

b) define a new additional cert name checking function (could be in 
NSS or in PSM), which expects the application to supply it ASCII-only names
(including IDN/punycode names) as mozilla now supplies.  This function
would make a UTF8 encoded copy of the input ASCII/punycode string. 

In either function, Subject Common Names would be compared to BOTH the 
UTF8 and punycode strings.  SubjectAltName DNSnames would be compared 
only to the punycode string.

In choosing between these solutions, it matters whether CAs are going
to finally stop using cert subject CNs for DNSnames, and use subjectAltNames
for this instead, or whether they are going to continue to encode DNS names
in cert subject CNs, and in the latter case, whether they are going to 
encode those names with UTF8 (per RFC 3280) or encode them with punycode
(as I gather is wanted by the punycode promoters).

Option b above can be implemented without changing any existing NSS
function signatures, and while preserving 100% backwards compatibility
in NSS.  Here's why and how:

When libSSL has collected a cert chain and wants to validate that cert
chain, including validating the name in the chain, it calls an 
*application-supplied* callback function.  Most applications just 
register one of libSSL's own functions (namely SSL_AuthCertificate) as 
that callback function.  SSL_AUthCertificate calls CERT_VerifyCertName
which is the existing function that accepts UTF8 hostnames for comparison.

But some applications (including Mozilla) register their own callback 
functions.  Mozilla's is seen at 
http://lxr.mozilla.org/seamonkey/source/security/manager/ssl/src/nsNSSCallbacks.cpp#296
Mozilla's function could call a new function that behaves as 
SSL_AuthCertificate but which calls the function described in b above,
and NSS's backwards compatibility would be unchanged.  
As a practical matter IE is still the dominant browser, so CA's will do what
works in IE+IDNplugin. From Thawte's slideshow in comment 8 and
http://www.thawte.com/IDN/ it looks like they're putting user-readable stuff in
the CN, and I guess punycode in the subjectAltName (because IE validates the
cert) though they don't give details.
minus for 1.0.1 and plus for a well tested solution in 1.1
Flags: blocking1.8b2+
Flags: blocking1.8b-
Flags: blocking1.7.6-
Flags: blocking1.7.6+
Flags: blocking-aviary1.1+
Flags: blocking-aviary1.0.1-
Flags: blocking-aviary1.0.1+
There is no need to speculate about how to compare IDNs, or whether particular
fields of SSL certs should be expected to contain ACE form or Unicode form. 
Sections 2 and 3 of RFC 3490 answer those questions.

Any field of an SSL cert, no matter whether it allows non-ASCII characters or
not, is expected to contain the ACE form unless the spec for that field
explicitly cites RFC 3490 and invites non-ASCII domain names into the field. 
That doesn't mean you are forbidden from accepting non-ASCII domain names in
that field, but it does mean that whoever put a non-ASCII domain name there was
violating the IDNA spec (and by the way, they can expect the SSL cert to fail
unpredictably when processed by IDN-unaware applications, or by IDN-aware
applications that choose not to be so liberal in what they accept).

As for how to compare domain names, the required method is to convert them both
to ASCII and then compare the ASCII names as usual.  (Technically, you are free
to use any method that always returns the same answer as that method, but the
corner cases are tricky, so your best bet is probably just to do that.)
We're coming into the endgame on 1.8b2. Is this something that's gonna happen in
the next few days or should it be pushed out to 1.8b3?
Asa, This bug is not going to be addressed for 1.8b2.  

Let's review the situation to understand the severity.  

As I understand it, there is *no spoofing issue here*.  The only issue is
a potential cert name mismatch when the cert contains a Unicode (non-ACE)
DNSname.  This leads to cert name mismatches, but no false positive matches,
so there is no spoofing vulnerability here. 

Mozilla is passing ACE form names to NSS for comparison.  This works JUST FINE 
for certs that have ACE form DNSnames in their CNs and/or subject alt names.  
In comment 14, Adam Costello asserts that RFC 3490 conformant certs will
use ACE form names.  So, the existing NSS code works fine for those certs.

But we know that other standards that predate 3490 allow non ACE-form CNs,
and that there are CAs that issue certs with non-ACE form DNSnames in CNs.
We declare a hostname mismatch for them today.  Some other browsers make
these work.  I think this bug is of interest because many people want mozilla
to work in all cases where other browsers do (with the possible exception of 
MSIE  :) 

Presently, we are effectively being strict about requiring certs to contain
ACE form DNSnames.  It is being suggested that we loosen this to also work
for certs that contain Unicode DNSnames.  

Doing that means adding an MPLed UTF8-to-ACE converter, written in c, to NSS.
I don't have one of those in my pocket.  It won't happen this week, or next.  
Target Milestone: --- → 3.10.1
Version: unspecified → 3.9
Severity: normal → enhancement
Summary: Check handling of punycode domain names → Match unicode cert CNs to ACE (punycode) host names
moving out to a 1.8b3 nomination.
Flags: blocking1.8b2+ → blocking1.8b3?
Blocks: IDN
QA Contact: bishakhabanerjee → jason.m.reid
Flags: blocking1.8b3?
Flags: blocking1.8b3-
Flags: blocking1.8b-
Flags: blocking-aviary1.5+ → blocking1.8b5-
Target Milestone: 3.10.1 → 3.11
QA Contact: jason.m.reid → libraries
AFAIK, we haven't ha any bugs filed in the last 24 months about certs
containing non ACE-form names, so I think this is WFM.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: