Open Bug 336829 Opened 18 years ago Updated 1 year ago

NSS does not allow specifying a profile path containing characters outside of the system native codepage

Categories

(NSS :: Libraries, defect, P3)

3.11
x86
Windows XP

Tracking

(Not tracked)

People

(Reporter: neil.williams, Unassigned)

References

Details

When a cert DB directory pathname containing non-ASCII characters is specified certutil fails as in the following first line.

$ certutil -L -d 你
certutil: function failed: security library: bad database.
$ cd 你
$ certutil -L -d .
$

Notice that the cd command works and that the DB itself is OK (though empty). I traced this into mozilla/dbm/src/hash.c:173. The stat() call fails with the pathname "你/cert8.db"
I see bugzilla displays the committed comments differently than the way they are entered. A hex dump of the command looks like this

$ od -cx x.bat
0000000000     c   e   r   t   u   t   i   l       -   L       -   d       Σ
                6563    7472    7475    6C69    2D20    204C    642D    E420
0000000020     ╜   á  \r  \n  \r  \n  \r  \n
                A0BD    0A0D    0A0D    0A0D
0000000030
$

This was reported by a user who had named a directory with a single Chinese character.

Priority: -- → P3
Changing the bug title to reflect the underlying issue, as diagnosed in bug 413171.
Summary: certutil fails when pathnames have multi-byte characters → NSS does not allow specifying a profile path containing characters outside of the system native codepage
The solution to this ****appears*** simple.


Maybe someone could explain why this wouldn't work.

Currently, mozilla/xpcom/io/nslocalfileWin.cpp
has :
2794 
2795 NS_IMETHODIMP
2796 nsLocalFile::GetNativePath(nsACString &_retval)
2797 {
2798     //NS_WARNING("This API is lossy. Use GetPath !");
2799     nsAutoString tmp;
2800     nsresult rv = GetPath(tmp);
2801     if (NS_SUCCEEDED(rv))
2802         rv = NS_CopyUnicodeToNative(tmp, _retval);
2803 
2804     return rv;
2805 }
2806 

Instead, I propose:

2794 
2795 NS_IMETHODIMP
2796 nsLocalFile::GetNativePath(nsACString &_retval)
2797 {
2798     //NS_WARNING("This API is lossy. Use GetPath !");
2799     nsAutoString tmp;
2800     nsresult rv = GetPath(tmp);
2801     if (NS_SUCCEEDED(rv))
2802 +       rv = NS_ConvertUTF16toUTF8(tmp).get());
2802 -       rv = NS_CopyUnicodeToNative(tmp, _retval);
2803 
2804     return rv;
2805 }
2806 
----
Then no information will be lost, and wherever you are passing the path to back to the OS, you have to be using CopyNative to Unicode (as the OS doesn't accept local charsets, natively), so change that call into the corresponding ConvertUTF8toUTF16 call...

Problem transparently solved with with no backward incompat changes other than things that used to not work, now, will.
Because it breaks existing codes (includes third parties') which expect GetNativePath will return the path in file system charset.
The file system charset is UTF-16.   Current systems require the file system be NTFS for security reasons.  So the local code page is always UTF-16.   

If you need to use the "A" interface, set your code page to 65001, to get UTF-8 -- which will store all the characters without corrupting them.
I don't understand how this bug (or bug 852086 which was marked as a dup of this one) are different from the FIXED bug 469856. In that bug we switched Mozilla to use the "short pathname" which will be ASCII, and that appeared to fix most people's issues. Is it possible that the people who still experience this bug have disabled short 8.3 pathname generation?
I regularly turn off 8.3 pathname generation on NTFS file systems as it cuts 
the number of filenames on my system in half.

This increases lookup speed (factor of 2), and decreases the sizes of the 
NTFS name area significantly.

At this point, I assume the programs running on WinXP and above are Unicode
capable.

*Theoretically*,  codepage 65001 SHOULD be able to store the characters of all locals.

But I would say it is very bad practice to rely on 8.3 pathname generation as it was supposed to be a transitional method to port non-unicode programs from Win98/WinME days to unicode (that should be fully supportable in UTF-16 (or utf-8)...
(In reply to L A Walsh from comment #8)
> If you need to use the "A" interface, set your code page to 65001, to get
> UTF-8 -- which will store all the characters without corrupting them.

How did you change the system code page to 65001?  It was not possible until very recent Windows 10 Insider Preview.
`chcp 65001` does not change the behavior of "A" functions.

Even if I can change the code page, we can't force every Firefox users to change it.
chcp works in Win7.  Maybe Win8 or early Win10 disabled it?

You are right in that the behavior of the A functions is not changed, but ... hmm... i'm not a windows programmer, I plead the 5th.  Ok, I was hoping the "A" functions would accept "bytes"... 

I.e. if you have a UTF-8 char that is more than 1 byte long, I was hoping that it would take a byte stream, though I'm not sure what I had in mind 5 years ago.
But if you have a string "AäA", it's actually a byte-string 4 chars long:
> echo -n "AäA"|hexdump -C
00000000  41 c3 a4 41                                       |A..A|
00000004
or in hex:
> echo "\x41\xc3\xa4\x41"
AäA
----
in cmd.exe in win7:
C:\bin>.\echo -ne "\x41\xc3\xa4\x41"
AäA

Here's proof that the chcp is doing something in win7:
New cmd.exe:
C:\bin>chcp
Active code page: 437

Then use cygwin's echo to create a file with utf-8 in it:
C:\bin>.\echo -ne "\x41\xc3\xa4\x41" >\tmp\f1.txt


Then use windows 'type' to list contents:
(see bogus output)

C:\bin>type \tmp\f1.txt
AäA

Then change to utf-8 Codepage and display again.  This time, type displays it correctly:

C:\bin>chcp 65001
Active code page: 65001

C:\bin>type \tmp\f1.txt
AäA

So type would be using the "same" interface (looks like 8-bit "A" interface).  When set to utf-8, 
it will still display the utf-8 encoded byte stream correctly.

Can't get users to "chcp", but if users want to use "wide" chars with "A" interface, they
need to use UTF-8 which will encode the "Wide" chars into a byte-stream that seems to
display correctly.  

Not sure about other uses... need more testing for that...
chcp will change the encoding of file "content" and console input/output, but will not change the file "name" handling of "A" APIs.
By the way, I changed the NSS file path handling in bug 1428538. NSS 3.35 or later will use UTF-8 for the new sqlite db path. It still uses the ANSI code page for old DBM dirs for compatibility with old versions.
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 7 duplicates.
:beurdouche, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bbeurdouche)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(bbeurdouche)
You need to log in before you can comment on or make changes to this bug.