Closed Bug 3690 Opened 26 years ago Closed 24 years ago

File path strings should be stored in registry as UTF-8

Categories

(Core Graveyard :: Tracking, defect, P1)

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: saari, Assigned: rayw)

References

Details

(Whiteboard: investigating)

Mac apprunner starts up with blank window. The RDF_DLL is failing to load.
Summary: Mac apprunner starts up with blank window. → [BLOCKER] Mac apprunner starts up with blank window.
GetSharedLibrary() in prlink.c called from PR_LoadLibrary fails to find RDF_DLL
Assignee: don → mcmullen
Summary: [BLOCKER] Mac apprunner starts up with blank window. → [BLOCK] Mac apprunner starts up with blank window.
Target Milestone: M3
Re-assigned to mcmullen@netscape.com, changed target milestone to M3, and
changed summary line slightly.

John, we need some Mac expertise here to help Chris Saari ...
John, Pink says this is happening for Joe Francis too.
There is a workaround: remove all non-7-bit ascii characters and slashes from all
superdirectories of your apprunner file.

The problem seems to be in the DLL registration mechanism (use of full paths).
Joe Francis had a bullet character in one of his superdirectory names, and Chris
Saari had a slash in his hard drive name.

Reassigning to dp.

Removing the [BLOCK] from the summary line.  This is a bad bug (we can't ship
with us) but we have the time to fix it properly.
.s/us/this/
Clarification: the error is that PL_LoadLibrary fails for components in the
Components subdirectory, if non-unix characters (or a unix separator) is in the
full path.

PR_LoadLibrary only works for (1) full paths or (2) DLL name for files directly
in the launch directory.  In the bad case, for some reason, a call finally gets
made to PR_LoadLibrary with the string 'RDF_DLL'.  This call will never succeeed,
because the loading code does not search recursively in subdirectories.

We do not understand why having interesting chars in the path causes this
failure, though.
Priority: P3 → P2
Additional info: when all is working well, nsFactoryEntries for factories coming
from DLLs in the Components folder contain UNIX-style full paths to those DLLs.
When there are 8-bit chars in any of the parent folders, these factory entries
contain the DLL name (fragment name) in the m_fullpath member.
OK, I found the real bug. Strings in the registry should be stored as UT8, and
the registry throws back an error from NR_RegSetEntryString() when you attempt
to store a string that is invalid UTF8 (like Mac paths containing 8-bit chars).
But, of course, no-one is checking return values of ANY of the registry calls
in the component manager code, so we missed the error.

The call that failed is on line 494 of nsComponentManager.cpp.
Status: NEW → ASSIGNED
Yes it is a registry problem. Robert and I found that in our session of
debugging too. A bug has been filed on dan veditz. If we dont get it fixed, we
need to release note this.
Um, shouldn't you be passing UTF8 strings into the registry, instead of just
assuming that it's a registy bug?
OH I understand what you mean now. You are right. So how to I convert from UTF8
and back...I will try that. frank frank where are you...
dp: think about the Japanese user, whose directory names on Mac may be full
of Japanese characters (including 2-byte characters). Now be scared about
doing anything with raw file paths  :-)
Severity: normal → major
Priority: P2 → P1
Setting to P1.  Will check with Mar15 build when it comes out.
We are going to release note this one. Keeping open for that reason.
I mean release note for dogfood...
Why not store the URL? It's another 7-bit another encoding, and supported with
nsFileURL.  Is that good enough for Japanese?
Target Milestone: M3 → M4
add the release note that you want to bug
http://bugzilla.mozilla.org/show_bug.cgi?id=3646
moving this off the M3 list.  move it back if we think we have a fix
Target Milestone: M4 → M5
I think this can be release noted for M4 as well.
Summary: Mac apprunner starts up with blank window. → [PP]Mac apprunner starts up with blank window.
What's up with this bug? Is it going to fade into obscurity?
dp has been in india visiting his family.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Well now Mac apprunner start up with an http error )April 25 build.  On 2nd
launch, No longer a blank window.  So, I would say this particular bug is Fixed
now.  saari, will leave to you to mark Verified. ;-)
Status: RESOLVED → REOPENED
This bug is not fixed. I've adjusted the summary to reflect the real issue.
Resolution: FIXED → ---
Summary: [PP]Mac apprunner starts up with blank window. → File path strings should be stored in registry as UTF-8
Changed summary to "File path strings should be stored in registry as UTF-8".
We should not be storing full paths in the registry at all; rather, we should
use the nsPersistentFileDescriptor that mcmullen has written. And if
nsPersistentFileDescriptor outputs anything that is not always 7-bit ascii, then
its output needs to be stored as UTF-8.

QA, to verify this bug (when fixed), you need to put 8-bit ascii chars, and
slashes in your directory names before calling it fixed.
nsPersistentFileDescriptor outputs only 7-bit ascii (using base64 encoding).

I include the source of two functions from libpref that show the correct encoding
and decoding of filespecs for persistent storage.

//------------------------------------------------------------------------------
NS_IMETHODIMP nsPref::GetFilePref(const char *pref_name, nsFileSpec* value)
//------------------------------------------------------------------------------
{
    if (!value)
        return NS_ERROR_NULL_POINTER;
	char *encodedString = nsnull;
	PrefResult result = PREF_CopyCharPref(pref_name, &encodedString);
	if (result != PREF_NOERROR)
		return _convertRes(result);

    nsInputStringStream stream(encodedString);
    nsPersistentFileDescriptor descriptor;
    stream >> descriptor;
    PR_Free(encodedString); // Allocated by PREF_CopyCharPref
    *value = descriptor;
    return NS_OK;
}

//------------------------------------------------------------------------------
NS_IMETHODIMP nsPref::SetFilePref(const char *pref_name,
				    const nsFileSpec* value, PRBool set_default)
//------------------------------------------------------------------------------
{
    if (!value)
        return NS_ERROR_NULL_POINTER;
    nsresult rv = NS_OK;
    if (!value->Exists())
    {
        // nsPersistentFileDescriptor requires an existing
        // object. Make it first.
        nsFileSpec tmp(*value);
        tmp.CreateDir();
    }
    nsPersistentFileDescriptor descriptor(*value);
    char* encodedString = nsnull;
    nsOutputStringStream stream(encodedString);
    stream << descriptor;
    if (encodedString && *encodedString)
    {
		if (set_default)
			rv = PREF_SetDefaultCharPref(pref_name, encodedString);
		else
			rv = PREF_SetCharPref(pref_name, encodedString);
    }
    delete [] encodedString; // Allocated by nsOutputStringStream
    return rv;
}
I just noticed that last line should be

    return _convertRes(rv);

Nobody's perfect.
Status: REOPENED → ASSIGNED
Using filespec or any of its derivatives is not an option because of dependency
problems.

Now, I do have a simple half-ass solution. It I store filenames not as strings
but as bytes and retrieve it back, then I think all will be ok except for '/'
':' being problems in filenames. Other 8 bit characters would work as the
storage and retrieval will be in native encoding and NSPR can handle that.

I am going to test this theory. Stop me if you know it wont work.
You CANNOT store full paths, in any form, in the registry, otherwise you will
severely break Mac in a number of ways. For example, if the user renames their
hard disk, or renames a folder in the path the Mozilla, things will break.
Mac users routinely rename things and move stuff around on their hard drives.
We MUST be able to deal with these things.
Alternatively, you can wait a few days till I fix bug #5784, (making nsFileSpec
into a com interface) and then the dependency problem will be solved.
There a lot of problems with doing any of these on the mac. I do understand the
right thing to do is use the nsFileSpec stuff. Trust me I use it a lot
elsewhere.

Simon, apart from all the problems you mentioned, there is one another problem
that special characters like a bullet etc cannot be in any filename that XPCOM
uses.

Storing the filename as a byte sequence wont fix all the problems you mentioned.
But it will fix the problem not being able to put apprunner in a directory whose
full name has 8 bit characters.

So I thought doing that would be an improvement to the situation we are in. No ?
But dp, if I can COMMify nsFileSpec and friends, can't you then use this
(specifically, nsPersistentFileDescriptor)?
Even if nsFileSpec and co are COMified, there is going to be trouble making
xpcom.dll depend on code in base.dll  So I trust the only solution is to move
autoregistration and anything that deals with files out of xpcom. Moving autoreg
out is easy. But moving the dll loading part (uses the filename) is going to be
out of xpcom is hard. I am stuck!
dp, why can't XPCOM use characters like bullets in file paths (if you were to
use file paths anywhere)?

We _have_ to get this right. It would not be appropriate for a build problem
(all of which are fixable) to hinder us using the correct solution to this
problem.
>Even if nsFileSpec and co are COMified, there is going to be trouble making
>xpcom.dll depend on code in base.dll

dp, I don't understand why. As long as base.dll isn't autoregistered, what is the
problem? If you call through a com interface, you don't have link
dependencies...I know I may be being rather stupid here, as usual...
have we figured out what to do on this one for M5?
See the comments in bug 4965 about a possible solution of storing only relative
paths in the registry.

But now I think about it, why do we have to store paths in the registry at
all? After all, we know where the components directory is, and that we can find
components in that directory. Is the plan that at some point, Netscape DLLs will
be scattered throughout the users system?
Whiteboard: investigating
Target Milestone: M5 → M6
Ok I checked. The filename becomes a key. So targetting the full solution for
M6.
Sorry, I don't understand what you mean by the "full solution". Do you mean
full paths, or a complete (alternative) solution?
I mean a complete solution for all the nitty gritty problems. nsIFileSpec is one
of the options... but...
If you plan on storing filenames in libreg using REGTYPE_ENTRY_BYTES please
consider using REGTYPE_ENTRY_FILE -- On windows and Unix these are equivalent,
but on the Mac the _FILE type stores a binary alias.
There was a reason why I didn't use that. Let me think...it was crashing on the
mac for some reason. Maybe robert would remember why we switched it from FILE to
STRING.
Ah... I don't recall exactly, but I think one of the issues was that (on Mac)
XPCom internally always is using Unix pathnames which hit some bug as the
registry really wants Mac style paths (with colons)

Switching XPCom over to using nsFileSpec et.al. might help with this.  :^)
QA Contact: 3853 → 1308
Target Milestone: M6 → M7
*** Bug 7029 has been marked as a duplicate of this bug. ***
This bug has lost us several hours of engineer and release team time, trying
to figure out why builds don't run, only to find a / somewhere in the file path.
It is imperative that this bug is fixed ASAP.
XPCOM 2.0 landing was step 0 for this.

I am targetting this fix for M7. Really!
Depends on: 3081
Target Milestone: M7 → M8
dp ready to land this first part of m8
Status: ASSIGNED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
This should be fixed with the XPCOM using nsIFileSpec changes
QA, to verify this bug (when fixed), you need to put 8-bit ascii chars, and
slashes in your directory names before calling it fixed.

You should also test the case of having a slash / in the name of one of the
folders containing Mozilla. If this does not work still, please open a separate
bug.
This should be classified as a cross-platform bug, I think.
8-bit characters in path names should fail for Unix and Windows
without this fix. (if you agree, please change the platform.)

The "/" would be Mac-specific, the bullet is not.  I'll try other
MacRoamn specific characters also.

Is there any character on Windows-1251 (Latin 1 for Windows) and MacRoman
tables which cannot be dealt with with this fix? -- just so that we know
what the limitations are going in.
Moving all Apprunner bugs past and present to Other component temporarily whilst
don and I set correct component.  Apprunner component will be deleted/retired
shortly.
Status: RESOLVED → REOPENED
OS: Mac System 8.5 → All
Hardware: Macintosh → All
** Checked with 6/25/99 Win32 and Mac M8 builds **



On Mac:



1. Starting up is OK if superdirectry names contain bullet or other 8-bit

    characters used in Latin 1.

2. Start-up succeeds with JPN super-directory names.

3. But the slash makes the start-up fail.

4. Mozilla Preference does not create a pref dicrectory if the name

   put by the user contains 8-bit characters or slash. Thus even though

   you created a pref directory, you are asked to create a new one

   again on the next start-up. If you have an existing pref. directory

   with 8-bit characters in its name, Mozilla does not recognize

   the directory -- my pre-set pref settings were not honored.



On NT4-Japanese:



1. Start-up fails if super directory names contain Japanese characters.



----



I asked this question before, but we need a cross-platform fix for this.

Therefore I changed the Platform and OS to all, copied bobj on this.



I have a few questions about file systesms.



A. Does the Mac 0S 8.5 or above file system use Unicode/UC2 for storing string

   data (e.g. path & file names)?

     Is what used to be called "Unicode Imaging" service being used?



B. Does NT4 use Unicode/UC2 for the same?

     Essentially UC2?

C. Does Win95 use Unicode/UC2 for the same?

     Locale specific charset?

D. Does Unix use Unicode/UC2 for the same?

     Locale specific charset except in Unicode locales?



Re-opening this bug because it is not fixed ...
Resolution: FIXED → ---
Simon, I didn't understand your comment about opening another bug
about the slash. You want to create another bug for the slash bug
because it's more of a file system specific bug? Would it be better
to create new bugs for Mac items 3 and 4 above. I haven't done this
yet but I suspect that Japanese profile directory names will probably
fail on Windows also.
One more question:

F. Does Japanese MacOS 8.5/later use the same file system charset code
   as the US OS 8.5/later?
Status: REOPENED → ASSIGNED
Target Milestone: M8 → M9
Target Milestone: M9 → M10
Assignee: dp → hyatt
Status: ASSIGNED → NEW
I tried special characters in filenames in unix and I got navigator showup. Some
of the icons for teh chrome didn't show up. Otherwise things are fine.

nsNativeComponentLoader: autoregistering /home/dp/tmp/50®special/bin/components
nsNativeComponentLoader: autoregistering succeeded

Thats a start. So maybe the xul icon things is not 8bit clean.
I know that strings data and key names in the registry MUST be UTF8. Maybe
you're finding those .DLL's through autoreg each time, but they're not getting
stored in the registry with 8-bit chars in the name.

Or the prog-id for that matter...  I have another bug open on ftang to
implement a "ToUTF8()" in nsString, which will then make it easy to do the
right thing.
Adding dp back to the CC field, I think this is/will be a component Manager
problem.
Sounds like a dup of 10373 now -- URLs don't deal with non-ASCII.
Nope, URLs don't get stored in the registry. Different problem.
dveditz: I was responding to dp's comments:

I tried special characters in filenames in unix and I got navigator showup. Some
+ of the icons for teh chrome didn't show up. Otherwise things are fine.
+
+ Thats a start. So maybe the xul icon things is not 8bit clean.
Blocks: 13276
Assignee: hyatt → dp
reassigning to dp, not clear why this was given to hyatt/xptoolkit
Status: NEW → ASSIGNED
Target Milestone: M10 → M15
We are storing relative pathname in the registry. Hence this wont be an issue
until people start storing fullpathnames that aren't in our distribution.
It's still an issue if people are silly enough to put 8-bit characters in their
component name. But I guess that's a self-limiting problem because it won't
register, so they won't be able to test, so they'll change the name before they
ship it.
They probably won't put down component name as 8-bit char, but don't
forget the case that they may install SeaMonkey under c:/MyCòmpùtêr/Nétsãpë/
That problem is avoided by the relative pathname support dp mentioned above.
XPCOM stores relative pathnames for our components. So this will hit us as an
issue only if full pathnames are registered from outside components. PSM is
going to be the first.

The right solution is:
1. Get persistentDescriptor from nsIFile (avoids mac renaming harddrive issue)
2. Store the resulting filename as UTF8 strings in the registry.

(1) already happenes (need to check). (2) needs a converter.
Target Milestone: M15 → M16
Ray, wanna have a crack at this.
Assignee: dp → rayw
Status: ASSIGNED → NEW
I have been trying to isolate the separate issues in this one that look enough 
like the same problem that they were included in the same report.  Please 
correct me where I am in error.

1.  Mozilla code frequently assumes unix-style paths, such that a slash or other 
certain characters in the directory (or probably module) names will cause the 
code to screw up.  The offending code could be in a single place, or scattered 
throughout mozilla.  I will try to track it down.  But it seems to me that there 
is nothing that storing UTF-8 or native characters solves, because in most 
systems "/" is a simple seven-bit character whether using UTF8, ASCII, or some 
other 8-bit or otherwise-encoded character set.

2.  Mozilla needs to store file path strings in either UTF-8 or in a native 
format.  It is not clear to me why it needs to be UTF-8, because if you are just 
returning the bits to the system that it gave you, there should be no problem 
(except the separate problem identified as 1.).  

If there is, or it is anticipated that there will be in the future some GUI or 
other manipulation of the filename for which UTF-8 is required (or if the 
registry itself is better-equipped to handle UTF8), then two conversions 
probably need to be added:  one where the data is written to the registry, and 
one when it is gotten from the registry.  Or, if the code, as it currently 
exists, requires UTF-8 in some places (such as for filename manipulation), then 
the conversion needs to be done.  But I doubt it will fix the slash problem in 
the reported situations.
There is apparently a method to convert characters from the filesystem charset 
to unicode, but there is no method to convert back.

There are also apparently no existing methods to save or restore byte streams 
that are not UTF-8.

Neither of these problems are hard to solve, but I regret having to add new 
interfaces, either way I solve the problem -- storing native paths or storing 
UTF-8.
libreg stores data as UTF-8. ASCII just happens to be a subset of UTF-8 so it 
works, but if there were a component with a non-ASCII character in the name 
(including the directory name for a non-relative component) then it had better 
be encoded as UTF-8.  nsRegistry now supports a Unicode API which will 
automatically convert to UTF-8 storage for convenience.  Note that RDF has the 
same limitation on key names -- they must be UTF-8 encoded.

But frankly the problem is that files are being stored as strings in the first 
place. Since the component.reg uses the key name as the data you have little 
choice about that, but the component names could also be stored as data values 
of a key instead. This would give you the flexibility to use different types 
which might be more appropriate, such as raw bytes, or some nsIFile persistant 
format which would give you some hope of handling non-relative components 
correctly on the Mac.
> Note that RDF has the same limitation on key names -- they must be UTF-8 
encoded.

Actually, that is incorrect.  :^)
First, let's make it clear that what really needs to be solved is local non-URL 
filenames in general.

Not only are these file specifications being stored as strings, but the Mozilla 
code is manipulating them as strings.  It is not clear to me how many other 
places use them.

If we declare that these file specifications are not strings (because we don't 
know the characters), then they cannot be manipulated as strings, and every 
place that Mozilla currently manipulates them as strings must be replaced by a 
call to a native method that knows how to achieve the desired result on the 
specified platform.  And no need for the "/" <--> "\" substitutions, etc., 
because normal code will never touch the path or search for specific characters, 
and the native code that does knows the right characters.  There then needs to 
me a new registry method allowing storing native filenames into the registry.  
There appears to be low-level support for this still in the registry, but the 
high-level interface methods are missing, and apparently this caused some kind 
of problem in the past.  Even with this approach, there may occasionally be a 
need to display the string or allow the user to manipulate it so conversion 
to/from unicode / UTF8 is needed.

The alternative is that file interfaces could be modified such that native code 
performs character conversions such that every file specification passed in or 
out of the interfaces is UTF-8.  The problems of differing filename syntaxes 
still persists in this case.  The current code likes to translate filenames to 
unix-style before manipulating them, and this is a technique that would still 
work with UTF8.  A platform  for which slash is valid in a filename could escape 
it making these transformations would be bullet-proof-enough, IMO.  Whenever the 
filename is displayed to the user, it probably should be displayed in the 
non-unix form, so the conversion of UTF8 between native syntax and unix syntax 
still needs to be requestable.  The default syntax should be whatever it is now.

So, the choices, as I see them, are:

Filename specifications should be exposed:

1.  Native non-strings with explicit platform-specific manipulation methods.
2.  UTF8, with explicit methods to go between unix syntax and native syntax.

I like solution 2, because it does not deny the common assumption that file 
specifications are strings.  It is also easiest, because it allows us to keep 
most code the way it is, manipulating paths as strings rather than calling 
native utilities for all path / file specification manipulation, and storing 
them as currently-supported UTF8 strings in the registry.  But platform-specific 
manipulation methods could have advantages in certain cases, where it is not 
easy to transform a native filename into a unix filename that exhibits all the 
desired behaviors as it is manipulated as a string.
We already have a way to manipulate filenames without touching them as 
a path string: nsIFile and nsILocalFile do all the platform-specific magic. 
Unfortunately for this particular problem nsILocalFile does not support a 
"persistant string" encoding as its predecessor nsFileSpec did (Doug, is there 
a reason or just ran out of time?), but if it did that could be stored in the 
registry.

We could also now fairly easily expose storing filetypes as registry data 
values in nsIRegistry because JS could now pass nsILocalFile objects (we could 
implement a nsILocalFileMac to get at the Mac alias stuff we need). To take 
advantage of that, however, you'd have to change the structure of how the 
component manager stores components -- they'd have to be data values and not 
keys. Come talk to me if you're interested in going that route.
I now already tried storing file path strings as UTF-8, and it cannot be 
reasonably done at this time, since the components necessary for mapping between 
the many code pages of file systems and Unicode have not been loaded yet (which 
is the reason we need the settings).

My next attempt is to create the registry methods for registration of binary 
stuff.
Status: NEW → ASSIGNED
I added a comment, which must not have saved properly, designating this as 
mostly-fixed, with the exception of the JS Component loader, which I do not know 
how to test yet.
Now I put back the js component fix, too, so I am marking this fixed.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago24 years ago
Resolution: --- → FIXED
Belated I'm going to verify this fix.
If you look in registry.dat or mozregistry.dat when you have
profiles in Japanese, you see familiar UTF-8 3-byte sequences
for most of these Japanese characters.
This ig good enough for me to verify this fix.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.