Closed Bug 53133 Opened 24 years ago Closed 21 years ago

Need conversion of T61String data from ISO8859 to UTF8

Categories

(NSS :: Libraries, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: thayes0993, Assigned: jgmyers)

References

Details

(Whiteboard: [rfc2459] [cert])

Attachments

(1 file, 1 obsolete file)

RFC2459 suggests that applications convert data encoded using T61String from 
ISO8859-1 (Latin 1) encoding to UTF8.  This is suggested because some 
applications (IE?) used T61String as a place to put 8-bit Latin character sets.

This suggestion isn't strictly correct since T61 actually uses escape sequences 
to represent a variety of different character values.  However, the current 
implementation of T61String in NSS doesn't convert those either.

We should:
  1. Allow T61String to be treated as ISO8859-1 (optionally?)
  2. Implement true T61 handling

Of these, the first is more important, and if it is the only choice implemented, 
means that applications don't need to make a decision.
Terry, could you set the Target Milestone for this bug?  Thanks.
Target Milestone: --- → 3.3
Reassigned to Terry and marked as an "enhancement".
Assignee: wtc → thayes
Severity: normal → enhancement
Target Milestone: 3.3 → 3.4
Set target milestone 4.0.
Target Milestone: 3.4 → 4.0
Changed the QA contact to Bishakha.
QA Contact: sonja.mirtitsch → bishakhabanerjee
Whiteboard: [rfc2459]
Whiteboard: [rfc2459] → [rfc2459] [cert]
Terry, what is T61String?  It is not used in RFC 2459.
Wan-Teh, T61String is the same thing as TeletexString.
Blocks: 197009
Assignee: thayes0993 → jgmyers
Attached patch Proposed fix (obsolete) — Splinter Review
Attachment #138925 - Flags: review?(thayes0993)
Target Milestone: 4.0 → 3.9
John,  NSS 3.9 has shipped.  Retargetting for NSS 3.9.1.
Target Milestone: 3.9 → 3.9.1
Unlikely to meet the criteria for 3.9.1, targeting at 3.10
Target Milestone: 3.9.1 → 3.10
Attachment #138925 - Flags: review?(thayes0993) → review?(MisterSSL)
Comment on attachment 138925 [details] [diff] [review]
Proposed fix

John, Thanks for your contribution!  

Here in NSS land, a single r+ is sufficient to checkin a bug fix, except in the 
weeks just before RTM.	Also, we have the practice of giving r+ to a patch,
with 
suggested changes.  This gives the patch submittor a chance to make some 
additional changes and check it in without an additional review.  
That's whay I'm doing here.

1. I suggest the new functions contain the string "iso8859_1" instead of
"iso88591".
2. Please add some comments in secname.c, explaining that we're not doing real
T61-to-UTF8 conversion, but rather are treating T61 strings as if they were 
iso 8859-1 strings.
3. Please add one-line comments in secport.h and utf8.c explaining what each of
the
new functions does.
4. FYI, we decided long ago to stop using PR_EXTERN and PR_IMPLEMENT macros
because
they defeat the ctags program.	Instead, we created some new macros, named
NSS_EXTERN, NSS_IMPLEMENT, NSS_EXTERN_DATA, NSS_IMPLEMENT_DATA, declared in 
base/nssbaset.h, that serve the same purpose and do not confuse ctags. 
Please use those in new code.  Thanks.
Attachment #138925 - Flags: review?(MisterSSL) → review+
Addressed review comments other than comment (1), add myself as a contributor
to two files.

I deliberately chose names such as sec_port_iso88591_utf8_conversion_function
as being less ugly than sec_port_iso8859_1_utf8_conversion_function.  The
convention established by the other charaset names in those functions is to
remove '-' characters from charset names.  It's not as if NSS is going to
support any other 8859-based charset.
Attachment #138925 - Attachment is obsolete: true
Comment on attachment 139078 [details] [diff] [review]
Address most review comments

Requesting checkin approval
Attachment #139078 - Flags: superreview?(wchang0222)
Comment on attachment 139078 [details] [diff] [review]
Address most review comments

second review not required for NSS at this time.
Attachment #139078 - Flags: review+
Comment on attachment 139078 [details] [diff] [review]
Address most review comments

This patch looks good.	There are some minor problems
that should be fixed before checking it in.

1. Some compilers may warn that the switch(convert)
statement you added to lib/certdb/secname.c does not
handle the case convert_none.

2. Question: is it T.61 or T61?

3. The 'inBuf' parameter for both PORT_ISO88591_UTF8Conversion
and sec_port_iso88591_utf8_conversion_function should be
declared with 'const'.	Question: these two functions are
identical.  Why do we need two functions?

4. lib/util/secport.h (a public header) should not
include nssbaset.h (a private header). You can avoid that
by not using the NSS_EXTERN macro.

5. Do not use the NSS_IMPLEMENT macro in lib/util/utf8.c.
Do not replace it with PR_IMPLEMENT either.  Just remove
it.  (NSS uses *.def files to control the export of
symbols from DLLs.)

6. I did not review the implementation of
sec_port_iso88591_utf8_conversion_function and
test_iso88591_chars in lib/util/utf8.c.  I may try to
do that tomorrow.

If you can address 3, 4, and 5 (1 is optional), you can
go ahead and check the patch into the NSS tip.	Note:
the Mozilla trunk is using NSS_CLIENT_TAG.
Attachment #139078 - Flags: superreview?(wchang0222) → superreview+
Comment on attachment 139078 [details] [diff] [review]
Address most review comments

I just realized that nssbaset.h is a public header and
I contradicted Nelson's suggestion of using NSS_EXTERN
and NSS_IMPLEMENT.

I still think files in lib/util should not include
nssbaset.h (it is a "Stan" header) and the NSS_EXTERN
and NSS_IMPLEMENT macros should not be used on
sec_port_iso88591_utf8_conversion_function.
Checking in security/nss/lib/certdb/secname.c;
/cvsroot/mozilla/security/nss/lib/certdb/secname.c,v  <--  secname.c
new revision: 1.16; previous revision: 1.15
done
Checking in security/nss/lib/util/secport.c;
/cvsroot/mozilla/security/nss/lib/util/secport.c,v  <--  secport.c
new revision: 1.17; previous revision: 1.16
done
Checking in security/nss/lib/util/secport.h;
/cvsroot/mozilla/security/nss/lib/util/secport.h,v  <--  secport.h
new revision: 1.8; previous revision: 1.7
done
Checking in security/nss/lib/util/utf8.c;
/cvsroot/mozilla/security/nss/lib/util/utf8.c,v  <--  utf8.c
new revision: 1.4; previous revision: 1.3
done

Review responses:

1. Addressed

2. The character encoding scheme is "T.61".  The label has many names, NSS uses
"T61".

3. Declared const.  Was following the pattern set by the UCS-2 and UCS-4 to/from
UTF-8 converters.  The dual functions similarly follows the pattern set by the
UCS-2 and UCS-4 converters, but I left out implementing
PORT_SetISO88591_UTF8ConversionFunction() as I don't see any need for being able
to replace the ISO-8859-1 to UTF-8 conversion function.

It appears that PORT_SetUCS2_UTF8ConversionFunction() and
PORT_SetUCS4_UTF8ConversionFunction() have no purpose, but merely mimic the
misnamed PORT_SetUCS2_ASCIIConversionFunction().  The latter should really be
named PORT_SetUCS2_PasswordConversionFunction() as it has nothing to do with ASCII.

4. Addressed

5. Addressed



Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Wan-Teh: If you'd like me to remove one of the two functions, let me know which
function name you want to keep.
Comment on attachment 139078 [details] [diff] [review]
Address most review comments

Bob, Nelson, could you answer John's question in comment 17?

I just reviewed the actual conversion function. Since I don't
know how Latin-1 should be converted to UTF-8, I can only
verify that John's code does exactly what his comments say
the conversion should be.

John, I have two question about the /* 7F -> 0xxxxxx */ comment.
Should "7F" be "00-7F"?
Should "0xxxxxx" have one more 'x'?

I didn't review the test_iso88591_chars function.
Comment on attachment 139078 [details] [diff] [review]
Address most review comments

I finally reviewed the test_iso88591_chars function.
The only change I want to suggest is to declare the
'utf8' array with a size of 3 instead of 8:

>+    unsigned char utf8[8];

It only needs to be 3 bytes to hold the conversion
output (1-2 bytes) and the null byte we add at the
end.  This will make the test stricter.

Question about the two fprintf statements:

>+      fprintf(stdout, "Failed to convert ISO-8859-1 0x%02.2x to UTF-8\n", iso88591);

>+      fprintf(stdout, "Wrong conversion of ISO-8859-1 0x%02.2x to UTF-8: ", iso88591);

The %x format needs to match an unsigned int.  Since
'iso88591' is an unsigned char, are we relying on C's
automatic promotion of unsigned char function arguments
to unsigned int?  Other similar fprintf statements in
the file use an explicit (unsigned int) cast.
Wan-Teh asked me to look at the issue of the two functions
PORT_ISO88591_UTF8Conversion and sec_port_iso88591_utf8_conversion_function and
decide if we need to identical functions.  So, I thought about it and have the
following recommendation.

Rename PORT_ISO88591_UTF8Conversion to PORT_T61_UTF8Conversion, and have it 
call a settable pointer (just as PORT_UCS2_UTF8Conversion does).  By default,
that pointer will point to sec_port_iso88591_utf8_conversion_function.  
Move the comment about treating T.61 strings as ISO-8859-1 strings to secport.c
from secname.c, and ahve secname.c call PORT_T61_UTF8Conversion.

Rationalle (sp?): As John observed, no one will want to replace our
ISO-8859-1-to-UTF8 conversion functions with their own ISO-8859-1-to-UTF8
conversion function, but they MIGHT want to put in a TRUE T.61 conversion
function, and we should enable that.  
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Re the question in comment 19, yes we are depending on automatic promotion rules.

Re comment 20, I would like to disagree with the need for a caller settable T61
conversion function.  My reasons:

1) Such parsing of fields in certs belongs at the NSS level.  Giving the calling
application responsibility for such obscure minutiae is very poor layering. 
This is a quite different situation than PORT_SetUCS2_ASCIIConversionFunction(),
which has the application expose knowledge about the application's own password
callbacks, not about how to parse certs.

2) There is no demonstrated need for the feature.  In practice, the
TeletexString label means ISO-8859-1 encoding.  There is no customer
demonstrating deployed T.61 certs or asking support for same.
The original reason that NSS has callback for the character set conversion
functions was that certain products that used NSS had their own functions,
some of which were licensed from third parties, and they wanted to be sure
that NSS used their functions.  In fact, it was only after NSS began to be
used in other products that did not contain their own conversion libraries
that the utf8.c file was added to NSS.  

Anyway, Wan-Teh asked me to render my opinion on the issue of two functions,
one of which only calls the other, and I did.  
Nelson, could we close this bug as FIXED and open a new bug to address the
enhancement of being able to convert other T.61-labeled charsets?
Yes.  
Status: REOPENED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → FIXED
I would like to see the fix for this bug in NSS 3.9.1
It would be a very useful fix for me, since I am affected by the bug because my
name includes a danish letter (Lars Teglgård Smidt) which means I cannot use my
personal certificate with any mozilla product.
Changed the target milestone to NSS 3.9.1.

It turns out that this fix is already in the NSS_3_9_BRANCH
and the NSS_CLIENT_TAG (the tag used by the Mozilla trunk).
So, current Mozilla trunk builds (2004-01-28 or later)
already have this fix.
Target Milestone: 3.10 → 3.9.1
This bug and its fix affected all platforms and OSes.
OS: Windows NT → All
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: