Open Bug 237077 Opened 20 years ago Updated 2 years ago

legacy nicknames may contain ISO-Latin1 or UTF8 (should all be UTF8)

Categories

(NSS :: Libraries, defect, P3)

x86
Windows 2000

Tracking

(Not tracked)

People

(Reporter: nelson, Unassigned)

References

Details

(Keywords: intl)

NSS's algorithm(s) for deriving and storing nicknames and email addresses
are not careful to ensure that the result is always UTF8 encoded, or always
ISO-Latin1.  Consequently, some nicknames contain ISO-Latin1 encodings of 
certain characters, and hence are NOT valid UTF8 strings.  This creates 
problems when such strings must be converted to some other encoding, such
as UCS2.  

One consequence of this is bug 217305.  The code to export a cert/key in a
PKCS12 file attempts to convert the nickname from UTF8 to UCS2, and that
attempt fails when the string contains Latin characters.

Presumably there are other problems also, such as when the nickname string
is displayed in a GUI dialog.   

I think the problem needs to be solved on two levels:
1) we need to start converting strings from certs from Latin1 to UTF8 in 
places where we do not.
2) we need to deal with DBs that contain unconverted strings.  This may entail:
2a) converting DBs to use UTF8 consistently, or 
2b) employing a heuristic that tries UTF8, and failing that, tries Latin1.
I'd suggest 2b, which is what Mozilla mailnews does for headers.  It is highly
unlikely that a string in ISO-8859-1 would conform to UTF-8 syntax.

I do agree we need to do an audit of ways we could get ISO-8859-1 in the database.
It might be useful for the UTF8-to-UCS2 converter to also act as an 
ISO-8859-x to UCS2 converter.
(In reply to comment #2)
> It might be useful for the UTF8-to-UCS2 converter to also act as an 
> ISO-8859-x to UCS2 converter.

I would disagree.  Strings that are explicitly labeled UTF8 should not be
interpreted as ISO-8859-1.  Otherwise you open up the multiple-encodings
security issue that was the reason for making the UTF8 converters strict.

An ISO-8859-1 to UCS2 converter should be separate.
2b) isn't likely to work because Latin-1 is not the only legacy encoding used
but other legacy encodings are likely to be mixed in as well. 

> I'd suggest 2b, which is what Mozilla mailnews does for headers

  Not true in general. You can choose what character encoding to fall back to
per folder. If DBs have only two encodings (UTF-8 and one legacy encoding, be it
ISO-8859-1, Shift_JIS, KOI8-R, Big5), we may let "users"(?) choose what
non-UTF-8 encoding to try. 
Keywords: intl
I'm not 100% sure, but I think comment 4 is not correct.  This is an NSS 
bug, not a PSM bug.  

We're talking about NSS "nicknames" and email addresses.  If I'm not mistaken,
users do not enter their own nicknames for certs any more, but rather nicknames
are extracted from the cert's subject name, typically from the subject common 
name and organization name.  So, the nicknames only contain characters found 
in the certs.  

Characters in the certs may have been encoded as any of the following types
of strings (types denoted by ASN.1): 
    Universal  (UCS4, which NSS converts to UTF8)
    BMPString  (UCS2, which NSS converts to UTF8)
    UTF8       (which is UTF8)
    IA5        (a subset of ASCII, and thus of UTF8)
    Printable  (a subset of ASCII, and thus of UTF8)
    Teletex    (also known as T61).  Until recently, NSS treated this as
               if it, too, was a subset of UTF8, which it is not.

The problem that is the subject of this bug is that TeletexStrings have 
been widely (but improperly) used to hold ISO-Latin1 characters.  
For a long time, NSS did not properly convert Teletex strings to UTF8.
That has recently been fixed by John Myers.  But there are still some
cert databases out there that have uncoverted Teletex/ISO-Latin1 character
strings in them.  We have to make those work.

I am not aware of any CAs who issue certs that encode other character sets
(such as any Asian character sets) using TeletexString encoding.  AFAIK,
those other character sets have always been encoded using one of the 
sets named above that are properly converted to UTF8.  

But if you can demonstrate a cert DB with a nickname from some other 
character set, please let me know.
Please consider if this bug also blocks this bug 243738
(In reply to comment #5)

> I'm not 100% sure, but I think comment 4 is not correct.  This is an NSS 
> bug, not a PSM bug.  

Thank you for your explanation and sorry for my ignorance. 

>     Teletex    (also known as T61).  Until recently, NSS treated this as
>                if it, too, was a subset of UTF8, which it is not.

> I am not aware of any CAs who issue certs that encode other character sets
> (such as any Asian character sets) using TeletexString encoding.  AFAIK,
> those other character sets have always been encoded using one of the 
> sets named above that are properly converted to UTF8.  
 
In comment #1, John wrote :
> I'd suggest 2b, which is what Mozilla mailnews does for headers.  It is highly
> unlikely that a string in ISO-8859-1 would conform to UTF-8 syntax.
  
Given what Nelson wrote,  2b certainly makes sense. Strings in ISO-8859-1 are
never be valid UTF-8 strings (unless they're just ASCII strings). We have to be
careful, though. Strings in Windows-1252 (a superset of ISO-8859-1) can be valid
UTF-8 strings (and can be subject to misinterpretation). Nelso, is there any CA
that mistakenly uses 'Teletex' to store Windows-1252 strings? I hope not because
if there is, things get complicated.



  
QA Contact: bishakhabanerjee → jason.m.reid
Assignee: wtchang → nobody
QA Contact: jason.m.reid → libraries
Summary: nicknames may be ISO-Latin1 or UTF8 → legacy nicknames may contain ISO-Latin1 or UTF8 (should all be UTF8)
Priority: -- → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.