Open
Bug 237077
Opened 20 years ago
Updated 2 years ago
legacy nicknames may contain ISO-Latin1 or UTF8 (should all be UTF8)
Categories
(NSS :: Libraries, defect, P3)
Tracking
(Not tracked)
NEW
People
(Reporter: nelson, Unassigned)
References
Details
(Keywords: intl)
NSS's algorithm(s) for deriving and storing nicknames and email addresses are not careful to ensure that the result is always UTF8 encoded, or always ISO-Latin1. Consequently, some nicknames contain ISO-Latin1 encodings of certain characters, and hence are NOT valid UTF8 strings. This creates problems when such strings must be converted to some other encoding, such as UCS2. One consequence of this is bug 217305. The code to export a cert/key in a PKCS12 file attempts to convert the nickname from UTF8 to UCS2, and that attempt fails when the string contains Latin characters. Presumably there are other problems also, such as when the nickname string is displayed in a GUI dialog. I think the problem needs to be solved on two levels: 1) we need to start converting strings from certs from Latin1 to UTF8 in places where we do not. 2) we need to deal with DBs that contain unconverted strings. This may entail: 2a) converting DBs to use UTF8 consistently, or 2b) employing a heuristic that tries UTF8, and failing that, tries Latin1.
Comment 1•20 years ago
|
||
I'd suggest 2b, which is what Mozilla mailnews does for headers. It is highly unlikely that a string in ISO-8859-1 would conform to UTF-8 syntax. I do agree we need to do an audit of ways we could get ISO-8859-1 in the database.
Reporter | ||
Comment 2•20 years ago
|
||
It might be useful for the UTF8-to-UCS2 converter to also act as an ISO-8859-x to UCS2 converter.
Comment 3•20 years ago
|
||
(In reply to comment #2) > It might be useful for the UTF8-to-UCS2 converter to also act as an > ISO-8859-x to UCS2 converter. I would disagree. Strings that are explicitly labeled UTF8 should not be interpreted as ISO-8859-1. Otherwise you open up the multiple-encodings security issue that was the reason for making the UTF8 converters strict. An ISO-8859-1 to UCS2 converter should be separate.
Comment 4•20 years ago
|
||
2b) isn't likely to work because Latin-1 is not the only legacy encoding used
but other legacy encodings are likely to be mixed in as well.
> I'd suggest 2b, which is what Mozilla mailnews does for headers
Not true in general. You can choose what character encoding to fall back to
per folder. If DBs have only two encodings (UTF-8 and one legacy encoding, be it
ISO-8859-1, Shift_JIS, KOI8-R, Big5), we may let "users"(?) choose what
non-UTF-8 encoding to try.
Keywords: intl
Reporter | ||
Comment 5•20 years ago
|
||
I'm not 100% sure, but I think comment 4 is not correct. This is an NSS bug, not a PSM bug. We're talking about NSS "nicknames" and email addresses. If I'm not mistaken, users do not enter their own nicknames for certs any more, but rather nicknames are extracted from the cert's subject name, typically from the subject common name and organization name. So, the nicknames only contain characters found in the certs. Characters in the certs may have been encoded as any of the following types of strings (types denoted by ASN.1): Universal (UCS4, which NSS converts to UTF8) BMPString (UCS2, which NSS converts to UTF8) UTF8 (which is UTF8) IA5 (a subset of ASCII, and thus of UTF8) Printable (a subset of ASCII, and thus of UTF8) Teletex (also known as T61). Until recently, NSS treated this as if it, too, was a subset of UTF8, which it is not. The problem that is the subject of this bug is that TeletexStrings have been widely (but improperly) used to hold ISO-Latin1 characters. For a long time, NSS did not properly convert Teletex strings to UTF8. That has recently been fixed by John Myers. But there are still some cert databases out there that have uncoverted Teletex/ISO-Latin1 character strings in them. We have to make those work. I am not aware of any CAs who issue certs that encode other character sets (such as any Asian character sets) using TeletexString encoding. AFAIK, those other character sets have always been encoded using one of the sets named above that are properly converted to UTF8. But if you can demonstrate a cert DB with a nickname from some other character set, please let me know.
Comment 6•20 years ago
|
||
Please consider if this bug also blocks this bug 243738
Comment 7•20 years ago
|
||
(In reply to comment #5) > I'm not 100% sure, but I think comment 4 is not correct. This is an NSS > bug, not a PSM bug. Thank you for your explanation and sorry for my ignorance. > Teletex (also known as T61). Until recently, NSS treated this as > if it, too, was a subset of UTF8, which it is not. > I am not aware of any CAs who issue certs that encode other character sets > (such as any Asian character sets) using TeletexString encoding. AFAIK, > those other character sets have always been encoded using one of the > sets named above that are properly converted to UTF8. In comment #1, John wrote : > I'd suggest 2b, which is what Mozilla mailnews does for headers. It is highly > unlikely that a string in ISO-8859-1 would conform to UTF-8 syntax. Given what Nelson wrote, 2b certainly makes sense. Strings in ISO-8859-1 are never be valid UTF-8 strings (unless they're just ASCII strings). We have to be careful, though. Strings in Windows-1252 (a superset of ISO-8859-1) can be valid UTF-8 strings (and can be subject to misinterpretation). Nelso, is there any CA that mistakenly uses 'Teletex' to store Windows-1252 strings? I hope not because if there is, things get complicated.
Reporter | ||
Updated•19 years ago
|
QA Contact: bishakhabanerjee → jason.m.reid
Reporter | ||
Updated•18 years ago
|
Assignee: wtchang → nobody
QA Contact: jason.m.reid → libraries
Reporter | ||
Updated•18 years ago
|
Summary: nicknames may be ISO-Latin1 or UTF8 → legacy nicknames may contain ISO-Latin1 or UTF8 (should all be UTF8)
Reporter | ||
Updated•18 years ago
|
Priority: -- → P3
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•