Closed Bug 413171 Opened 16 years ago Closed 14 years ago

NSS does not allow specifying a profile path containing characters outside of the system native codepage

Categories

(NSS :: Libraries, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 336829

People

(Reporter: ted, Unassigned)

References

Details

(Keywords: helpwanted)

Attachments

(1 file, 1 obsolete file)

I have a profile in a Windows XP VM that has chinese characters in the username (en-US Windows).  With 2008-01-19-04, I get an error message on startup:
"Could not initialize the application's security component. The most likely cause is problems with files in your application's profile directory. Please check that this directory has no read/write restrictions and your hard disk is not full or close to full. It is recommended that you exit the application and fix the problem. If you continue to use this session, you might see incorrect application behaviour when accessing security features."

I never used to get this.
See also bug 396052, (was blocking-1.9+) and bug 396199
Ted gave me some more details that might help to reproduce this bug.

I'm not able to work on this right away, so I'm dumping the information into this bug. I'd greatly appreciate any help to debug where exactly NSS is failing.

Ted: "I think PSM or NSS is doing something wrong on windows during startup. probably using GetNativePath."

Is there a way to type chinese chars on my system?

additional info: "i created a user on my windows xp VM with a chinese username. 
 i just copy/paste them from wikipedia. http://en.wikipedia.org/wiki/Hanzi . if you have XP, you can enable the "select a user" screen and just click on it."
Keywords: helpwanted
I can debug this if you can point me at a function to step through.
I'll happily blame http://mxr.mozilla.org/mozilla/source/security/manager/ssl/src/nsNSSComponent.cpp#1519

But the problem is really that there isn't a wide-character version of NSS_InitReadWrite: http://mxr.mozilla.org/mozilla/source/security/nss/lib/nss/nss.h#110
Bummer.  Could NSS be changed to expect a UTF-8 string there?  Not declaring what charset you're accepting with a char* is sort of bogus.
I doubt it, NSS has very strict backwards-compatibility requirements. You'd have better luck adding NSS_InitW and NSS_InitReadWriteW variants, I think.
This bug claims that NSS_InitReadWrite does not accept UTF-8 for configdir.

=> NSS for triaging
Assignee: kengert → nobody
Component: Security: PSM → Libraries
Product: Core → NSS
QA Contact: psm → libraries
Version: Trunk → unspecified
Well... it shouldn't accept UTF8 IMO... the previous behavior and normal behavior for APIs that take char* is to use the system native codepage, which is never UTF8. Of course Windows sucks and you can't represent the full unicode repertoire with any of its native codepages.
I think it's going to be difficult to reproduce this or debug it without 
having an exact copy of the directory pathname string passed to NSS_Init*.  
Could you add a comment with a dump in hexadecimal of the string with which 
the failure occurs? 

Also, what character sets is your windows box configured to use?  
What "code page" is it configured to use?

IIRC, the only requirement that NSS puts on the strings passed to NSS_Init* 
for directory names is that they NOT be "wide" character sets, such as UTF-16 
(a.k.a. UCS-2).  Any single-byte or "multi-byte" (variable width) character 
set, in which a single NUL byte terminates a string, is acceptable to NSS, 
assuming it is also acceptable to the underlying OS.  

In short, the caller of NSS_Init* must pass a string that is acceptable to 
the OS in the call to the OS's open function.  The required character set is whatever the OS expects for the open and stat calls.

ISTR there's some code that converts forward slash to backslash for path 
names when loading DLLs.  I wonder if that's matching on one byte of a 
multi-byte character.  Having a copy of the original directory name string 
would confirm or refute that hypothesis.

Also, ISTR there is some code in NSPR that calls Windows functions to convert 
between wide and narrow character strings.  I think the character set is 
implicitly chosen as whatever narrow character set ("code page") for which 
the OS has been configured.  I suspect this bug is going to end up being 
converted into an NSPR bug before it's done.
Nelson: the problem here is that you can have Unicode characters in a file path that are not representable in a Windows native encoding.  My VM is probably using Windows-1252, since it's just en-US, and the path contains Chinese characters, which have no representation in that encoding.  Using GetNativePath() at all with a path like that will not work.
The problem is that there is *no* char* string which can represent the actual unicode directory in question.

e.g. create a directory named C:\ขฃ (Thai characters) in any version of windows. There is no char* which can represent this directory, and you cannot open a file in that directory using a call to open(). You have to use WCHAR* and _wopen to open a file in this directory.
(In reply to comment #10)
NSS contains no calls to GetNativePath. Neither does NSPR. However PSM does.

(In reply to comment #11)
> The problem is that there is *no* char* string which can represent the actual
> unicode directory in question.

Well, of course, anything that can be represented in unicode can be represented in UTF-8 in a char * string.  

Perhaps the problem is that Windows cannot be persuaded to interpret the path 
name in a FileOpenA call as a UTF8 string?
Correct. The Windows "A" APIs are incapable of opening a file named with Thai characters.

PSM could certainly pass a UTF8 char* to NSS_Init, but I didn't think that would match your needs for backwards compatibility.
So, I think our choices are roughly among these:

a) redefine the existing NSS_Init* functions, declaring that the char* string 
parameters that they take for file/directory pathnames and prefixes are UTF-8
by definition.  This probably is bad for backwards compatibility, at least 
on Windows.

b) redefine the existing NSS_Init* functions, declaring that IFF the string 
argument values contain non-ASCII characters and also are valid UTF-8 strings,
then we will translate them into UCS2 and use wide-character functions for 
opening/stating files, etc., at least on Windows.  

I think most non-ASCII strings containing characters from narrow character 
sets other than UTF8 are NOT also valid UTF-8 strings, So the test for 
"IS this a valid UTF8 string" (which NSS already has) would be a pretty 
accurate way to differentiate between UTF8 strings and strings in other 
non-ASCII character sets.  If that is correct, this choice wouldn't be an incompatibility for most users.  But there may exist some non-ASCII strings that can be interpreted as valid UTF-8 AND as valid strings in some other character set that is a native character set in windows.  
If so, that could be a problem.

c) We could define a new set of NSS_Init* functions that are defined to 
take only UTF8 strings for path names, etc., as opposed to the present ones
that seem to take strings in the OS's present narrow character set ("code page").  I think this choice is the most likely to succeed of any of these.

d) We could define a new set of NSS_Init* functions that take wide character
strings instead of narrow character strings.  Frankly I'm afraid this would 
be a nightmare to do in NSS.  

I need to learn about what non-Windows OSes allow in pathnames.  
Do Linux, MacOS and most Unixen have wide-character forms of open?  
Do they accept UTF8 file names in open?  
Do they accept other non-ASCII non-UTF8 character strings in open?

The NSS team needs to discuss this in our next meeting.  
Help me gauge the urgency of this.
I gather that this is not perceived as a show-stopper for FF3, right?
Version: unspecified → 3.0
Mac and Linux generally use UTF-8 natively, as I understand it, so they don't have this problem.  From within PSM, we could do c) or d) very easily.

This is probably not a showstopper for Firefox 3, but we've spent a bit of time making sure that this case works, and it'd be a shame to regress it at the last minute.
> Do Linux, MacOS and most Unixen have wide-character forms of open?  

No.

> Do they accept UTF8 file names in open?  

Yes, *if* you have LANG set to UTF8, which is the default on modern linuxes/macos (mine is en_US.UTF-8)

> Do they accept other non-ASCII non-UTF8 character strings in open?

The LANG environment variable determines the codepage used for the filesystem. So if you had LANG set to en-US.EUC-JP then the char* passed to open() and other system calls would be interpreted as EUC-JP instead of UTF8.

I don't think this can be a FF3 stop-ship blocker, but we've really been trying to be fully unicode-correct for FF3.
One element of this bug that seems confusing.  In comment 0 and in comment 15 Ted seems to suggest that this is a regression in some recent version of the 
browser.  But the situation Benjamin describes, if relevant, would suggest 
that this has NEVER worked with NSS.  

So, I'd like to narrow this bug down, and decide:
Is this a problem that has always existed in NSS? or 
Is this a regression in some recent version of the browser?

Can we get a regression window?  Or at least some browser versions with which
the problem is not seen, as well as some version(s) with which it is seen?
Is it only a trunk issue?

Let me repeat that I'd really like to get a hex dump of the string(s) passed
to NSS_Init* by PSM when this failure occurs.
I think given the date of that code (from blame), this is probably not a regression.  I was probably mistaken in my original assessment.  We fixed a number of issues related to proper Unicode support in file paths, see the bugs mentioned in comment 1.  I thought I had tested after those landed, but I must have been wrong.  Regardless, I still think that being fully Unicode aware on Windows is important.
Well, FF3 shipped and this bug is stopping me from using it. My home path is "C:\Documents and Settings\Dženan Zukić". "ć" is the only non-en_US character. If you want, you can copy it from this comment.
Flags: wanted1.9.0.x?
Flags: blocking1.9.1?
Blocks: 440766
(In reply to comment #20)
> Well, FF3 shipped and this bug is stopping me from using it. My home path is
> "C:\Documents and Settings\Dženan Zukić". "ć" is the only non-en_US
> character. 

I'll wager that "ž" is also not an ASCII character.  :)

Dženan, are you also a FF2 user?  
Does FF2 have the same problem?  (I expect that it does)
Two points:

1) This should only be a problem if you have characters in your username that are not representable in your system's native codepage. Non-ASCII is fine, as long as it's in your codepage. If you're using Windows-1251 and trying to use non-ASCII, then you have a problem here.
2) Firefox 2 is likely to have even worse problems, as we fixed a number of Unicode path handling bugs in the startup/profile system for Firefox 3.
ž is part of code page 1252 (the default=en-US), code point 158. However, ć is not (therefore, it is not representable).
I use firefox2 at university (where this problem is occurring), I had it before and now I went back on it again (because I cannot use Firefox 3 obviously).
FF2, along many other applications, had create another folder profile "C:\Documents and Settings\Dženan Zukic", with "c" instead of "ć", and consistently uses it, so there is no problem.
I would say that some parts of FF3 use Unicode paths properly, and some other parts don't, which results in FF3 not able to start up at all.
Firefox 2 does not have a problem with profile being named "Dženan Zukić", becuase FF2 consistently uses folder "Dženan Zukic". It is wrong, and would make quite a mess if there were some other user named "Dženan Zukic", besides one name "Dženan Zukić". However, that is not the case and FF2 works as it should (although it stores it data in wrong folder - possible security issue with permissions).
Blocks: tomtom
(In reply to comment #13)
> Correct. The Windows "A" APIs are incapable of opening a file named with Thai
> characters.

What if the file path is first passed through GetShortPathName() before it goes to NSS? I don't see any documented guarantee that the result of GetShortPathName is ASCII but it is still better than nothing.
Attached patch Quick and dirty patch (obsolete) — Splinter Review
This patch takes care of the issue for me - I can use XULRunner despite the profile path containing Cyrillic characters. It also cures the crash (bug 440766).
Attachment #329051 - Flags: review?
Comment on attachment 329051 [details] [diff] [review]
Quick and dirty patch

Kai, bsmedberg, could you take a look at this?
Attachment #329051 - Flags: review?(kaie)
Attachment #329051 - Flags: review?(benjamin)
Attachment #329051 - Flags: review?
I was told that there is a registry setting to disable 8.3 names in NTFS - new files created won't have short names then, and GetShortPathName() will fail for them. But since that settings isn't the default (and is unlikely to ever become the default for reasons of backwards compatibility) it is a minor issue as opposed to the major issue that we have right now.
Comment on attachment 329051 [details] [diff] [review]
Quick and dirty patch

Another approach is to try the original pathname first.
If it fails, then try the short (canonical) pathname as
a fallback.
Attachment #329051 - Flags: review?(benjamin) → review+
Comment on attachment 329051 [details] [diff] [review]
Quick and dirty patch

Please brainstorm about Wan-Teh's proposal and if you believe his proposal is better, please attach a new patch.

If you're certain this patch is ok, moa=kaie
Attachment #329051 - Flags: review?(kaie)
Wan-Teh's proposal:

1. General assumption is that using the original path is better than using the canonical path. So far I don't see any evidence confirming that however, canonical path is always less problematic than the original path.
2. "Try and check if it fails" assumes that we know where it will fail. However, the code path failing now might be changed later - and we will happily accept the original path that will fail further along the road.

Which means that so far I don't see any reason to take this route (please prove me wrong). Note that we are already using my patch for TomTom HOME, without any issues reported so far.
PS: I have no idea what "moa" means...
Wan-Teh, perhaps you should elaborate slightlyon your proposal. 
I have understood it to be a proposal for PSM to call the NSS initialization
function potentially twice, calling it first as it does today, and if that
fails, to then try again using the path string obtained from 
GetShortPathName().  Is that a correct understanding of the proposal?

Wladimir,  TomTom HOME uses NSS ?
(I don't know what moa means either.)
Yes, that's my proposal.  Here is the precise version:

  Try calling NSS_Init* with the original pathname first.
  If NSS_Init* fails, then try calling NSS_Init* with the
  short (canonical) pathname as a fallback.

moa stands for module owner approval.
There are three calls - NSS_InitReadWrite, NSS_Init and NSS_NoDB_Init (the second being the fallback for first and third fallback for second). Since write permissions are independent of whether we can use the original path, we would have to add new fallbacks to each of those calls, so it would now be six initialization attempts. But I still don't see why we should make that effort.

TomTom HOME uses XULRunner which in turn uses NSS - and we applied my patch a while ago, due to numerous complains from our Chinese department.
There are actually (at least) 5 flavors of NSS initialization functions.
In addition to the 3 you mentioned, there are also NSS_Initialize and 
NSS_InitWithMerge.  Of those, NSS_Initialize is the most general.  It can
do anything that the others can do.  You can think of the others as 
simplified versions of NSS_Initialize that are used in common cases.
Nelson, I only listed the initialization functions actually being called in the code. We would need fallbacks for each of those calls - changing NSS_Initialize won't work because at that point we don't have an nsILocalFile any more, so we cannot get the canonical path.
The purpose of calling NSS_InitReadWrite with the original pathname
first is to minimize the change from the current code.  We only need
to retry NSS_InitReadWrite with the short (canonical) pathname if
the error code is SEC_ERROR_BAD_DATABASE.  You can add this second
NSS_InitReadWrite call between the NSS_Init and NSS_NoDB_Init calls.
(NSS_NoDB_Init ignores its argument.)  Alternatively, you can add a
helper function that does the NSS_InitReadWrite and NSS_Init sequence,
and you call this helper function once with the original pathname
and once with the short (canonical) pathname.  Again, NSS_NoDB_Init
only needs to be called once, with "" or NULL as the argument.

If you are sure that it is safe to just use the short (canonical)
pathname, using your patch has the advantage that the new code is
always used -- any bug in using the short (canonical) pathname
will be immediately noticed.  Since I am not familiar with short
pathnames, I propose a more conservative approach.
Attaching updated version of the patch (diffed against 1.9.1b2 and proper format), taking review over from previous patch.
Attachment #329051 - Attachment is obsolete: true
Attachment #353007 - Flags: review+
This bug documents a deficiency in NSS's API, and it should remain an NSS bug.
But the patch being pursued in this bug in not an NSS patch.  It is a PSM 
patch.  It would be best if a new bug were to be filed against PSM 
(product:core, component:security/PSM) about this, and have this patch be
attached to that bug, and reviewed there, etc.  Because when this patch is
eventually committed, it will *probably) solve the problem being experienced
by PSM, but will not solve the problem in NSS.

Also, in NSS, we don't allow the practice of "carrying forward" an r+ from
one patch to another.  Each patch gets separately reviewed.  

So, please make a PSM bug.  Thanks.
No longer blocks: tomtom, 440766
Flags: wanted1.9.0.x?
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Created bug 469856 for PSM.
Summary: "could not initialize the application's security component" with non-native characters in the profile path → NSS does not allow specifying a profile path containing characters outside of the system native codepage
There has been almost a year since last comment, is there any progress on this?
Dženan, the progress on bug 469856 seems to have removed all pressure to fix
this bug.  Will the solution to bug 469856 not solve the problem for you?
Nelson, the solution to bug 469856 is the patch in this bug...
... which is a PSM patch, not an NSS patch.  NSS people cannot review/approve
PSM patches and PSM people cannot review/approve NSS patches.  

Try this.  Take the patch that is currently attached to this bug and attach 
it to bug 469856, and ask some PSM people to review and approve it.
Comment on attachment 353007 [details] [diff] [review]
Quick and dirty PSM patch, updated (checked in)

This PSM patch was committed.  See bug 469856.
Attachment #353007 - Attachment description: Quick and dirty patch updated → Quick and dirty PSM patch, updated (checked in)
I believe this bug, and bug 336829 are duplicates of each other.
Sounds right. I'll dupe this bug over since it's got a lot of irrelevant content.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
The solution to this ****appears*** simple.


Maybe someone could explain why this wouldn't work.

Currently, mozilla/xpcom/io/nslocalfileWin.cpp
has :
2794 
2795 NS_IMETHODIMP
2796 nsLocalFile::GetNativePath(nsACString &_retval)
2797 {
2798     //NS_WARNING("This API is lossy. Use GetPath !");
2799     nsAutoString tmp;
2800     nsresult rv = GetPath(tmp);
2801     if (NS_SUCCEEDED(rv))
2802         rv = NS_CopyUnicodeToNative(tmp, _retval);
2803 
2804     return rv;
2805 }
2806 

Instead, I propose:

2794 
2795 NS_IMETHODIMP
2796 nsLocalFile::GetNativePath(nsACString &_retval)
2797 {
2798     //NS_WARNING("This API is lossy. Use GetPath !");
2799     nsAutoString tmp;
2800     nsresult rv = GetPath(tmp);
2801     if (NS_SUCCEEDED(rv))
2802 +       rv = NS_ConvertUTF16toUTF8(tmp).get());
2802 -       rv = NS_CopyUnicodeToNative(tmp, _retval);
2803 
2804     return rv;
2805 }
2806 
----
Then no information will be lost, and wherever you are passing the path to back to the OS, you have to be using CopyNative to Unicode (as the OS doesn't accept local charsets, natively), so change that call into the corresponding ConvertUTF8toUTF16 call...

Problem transparently solved with with no backward incompat changes other than things that used to not work, now, will.
(comment appended to the bug this was dupped to as well as here...)
The OS does accept local charsets through all of the "A" functions as well as stdio functions like "open". The "native path" from nsILocalFile represents the path that you would pass to "open", which is unfortunately lossy but there's no helping Windows suckage.
You need to log in before you can comment on or make changes to this bug.