Closed Bug 100676 Opened 23 years ago Closed 22 years ago

xpcom depends on uconv for platform charset info

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: alecf, Assigned: alecf)

References

Details

(Keywords: embed, topembed-, Whiteboard: fix in hand, need reviews)

Attachments

(8 files, 27 obsolete files)

11.60 KB, patch
Details | Diff | Splinter Review
7.45 KB, patch
ccarlen
: review+
waterson
: superreview+
Details | Diff | Splinter Review
34.96 KB, patch
sfraser_bugs
: review+
Details | Diff | Splinter Review
5.83 KB, patch
dveditz
: superreview+
Details | Diff | Splinter Review
1.21 KB, patch
Details | Diff | Splinter Review
10.54 KB, patch
Details | Diff | Splinter Review
7.54 KB, patch
shaver
: approval+
Details | Diff | Splinter Review
467 bytes, patch
mcafee
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
yuck.
upon investigation, these are the only files using this stuff:
nsFileSpec.cpp: for nsIPlatformCharset.h
nsLocalFileCommon.cpp: for nsIPlatformCharset.h
nsUnicharInputStream.cpp: for nsICharsetConverterManager.h
Blocks: 66759
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.6
As far as the dependencies between nsFileSpec and nsLocalFileCommon and uconv,
the way around this is to avoid the conversion altogether - use Unicode FS APIs
where supported (WinNT/2000, Mac OS9+), and have nsIFile be Unicode-only (bug
94323). For file systems which are not natively Unicode, FS <-> Unicode name
mapping can't be avoided but could be done with OS routines - Pete is working on
that for Unix and I've done it already for the Mac. 
Depends on: 94323
Priority: -- → P2
Depends on: 103384
yay, people are working on this for bug 94923..

*** This bug has been marked as a duplicate of 94923 ***
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
alecf@netscape.com: 
I don't think your dupe is correct !
Do you mean bug 94323 ?
I don't think this is a dup of bug 94323 but instead should block it. Changing
to Unicode-only APIs on nsIFile will put demands on the implementation of
nsLocalFile for each platform. Using uconv to do the conversion has problems and
we'll be forcing much more code through that point. I think what should happen
first is that we get Unicode file handling nailed down on each platform first,
then change the API.
talked with conrad.. reopening to keep this problems in small and managable
chunks...
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Status: REOPENED → ASSIGNED
Target Milestone: mozilla0.9.6 → mozilla0.9.7
no way I'll hit this for 0.9.6
Depends on: 110371
Depends on: 110531
nsFileSpec is so lame. I found more stuff in nsLocalFileUnicode (only found it
thanks to conrad!)

Anyway, I removed the operators, and fixed all the consumers which were using
them. 
since its mostly mailnews, I'll try to get sspitzer to review.
Comment on attachment 59919 [details] [diff] [review]
remove consumers of the nsFileSpec operators that use unicode

ugh, I just realized there's a bunch of stuff from bug 110531 in there. I'll
try to land that first.
also, in that patch you'll note that in almost every case of using nsString, we
actually had an ASCII version of the filename that we were blindly converting to
unicode, and then back to a char*.. so no data is lost here.
Comment on attachment 59919 [details] [diff] [review]
remove consumers of the nsFileSpec operators that use unicode

>Index: profile/pref-migrator/src/nsPrefMigration.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/profile/pref-migrator/src/nsPrefMigration.cpp,v
>retrieving revision 1.157
>diff -u -r1.157 nsPrefMigration.cpp
>--- nsPrefMigration.cpp	2001/11/07 06:23:53	1.157
>+++ nsPrefMigration.cpp	2001/11/30 22:20:29
>@@ -1353,7 +1353,7 @@
> }
> 
> static PRBool
>-nsStringStartsWith(nsString& name, const char *starting)
>+nsCStringStartsWith(nsString& name, const char *starting)

Shouldn't that be nsCString& then? Is this code used at all?

I'll continue when you're done with bug 110531, just wanted to comment on the
above.
you're right! that function is only used inside #ifdef
NEED_TO_COPY_AND_RENAME_NEWSRC_FILES - so I wrapped that function with it as
well, and updated it to nsCString

Comment on attachment 59919 [details] [diff] [review]
remove consumers of the nsFileSpec operators that use unicode

r=bienvenu, on the mail/news changes.
Attachment #59919 - Flags: review+
Comment on attachment 59919 [details] [diff] [review]
remove consumers of the nsFileSpec operators that use unicode

sr=sspitzer on the mailnews stuff, and the profile stuff.

we should definitely give a heads up to ji@netscape.com and
nhotta@netscape.com, so that nhotta can review and ji can check on a non
US-ASCII machine.
Attachment #59919 - Flags: superreview+
This is the same patch, but the tree changed slightly, so this one should apply
more cleanly
Attachment #59919 - Attachment is obsolete: true
adding nhotta and ji - is there any chance you guys can test this patch on a
japanese (or other non-european) system? The things we need to make sure work:
1) migrating from a 4.x profile, just make sure that messages in local mail
folders aren't lost
2) Compacting folders with non-Latin1 names doesn't erase the folder
(File->Compact Folders) on both IMAP and POP
3) Creating a new folder with a non-Latin1 name continues to work, on both IMAP
and POP

Thanks guys!
About the patch, how is that related to 'platform charset'?
well, nsFileSpec has an API that takes nsStrings, and the only way that API is
implemented is by using the platform (or system?) charset and then call the
appropriate char* method. We're trying to remove calls to uconv from xpcom, and
instead place that burden on the consumers.
> We're trying to remove calls to uconv from xpcom, and
> instead place that burden on the consumers.
I see, like this. Looks fine for me.

+    nsXPIDLCString nativeFolderName;
+    ConvertFromUnicode(nsMsgI18NFileSystemCharset(), nsAutoString(folderName),
+                       getter_Copies(nativeFolderName));
+    
+
path += nativeFolderName.get();

well, I still need to remove the old nsFileSpec APIs, so until I get to that,
this bug ain't going nowhere. moving to 0.9.8 :(
I know, I'm as disappointed as you are.
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Since the last patch was checked in, we had two more consumers of these
routines added. this patch fixes both consumers, and removes the evil operators
and methods all together.
can I get a r=conrad and sr=jag?
Comment on attachment 64162 [details] [diff] [review]
fix 2 more usage, and remove unicode operators completely

Glad to see them go. r=ccarlen
Though the r= stands, I can never miss an opportunity to encourage killing off
an nsFileSpec usage ;-)

This one:

 nsSpecialSystemDirectory
dtdPath(nsSpecialSystemDirectory::OS_CurrentProcessDirectory);
-        dtdPath += NS_ConvertASCIItoUCS2(nsDependentCString(kDTDDirectory) +
fileName);

begs for replacement by directory service and nsILocalFile. Then, you could just
use nsIFile::Append(const char*) 
Comment on attachment 64162 [details] [diff] [review]
fix 2 more usage, and remove unicode operators completely

sr=waterson
Attachment #64162 - Flags: superreview+
conrad - I'll get to that at some point I swear :) I just keep cleaning up the
tree to get rid of these consumers, and people keep adding them back...in any
case the above patch has been checked in. thanks guys.
*sigh* 0.9.8 was way to short. moving all my bugs to 0.9.9
Target Milestone: mozilla0.9.8 → mozilla0.9.9
This patch uses nsFSStringConversionMac which I put into nsLocalFileMac a while
back. This object uses native conversion facilities so uconv is not needed at
all. Notice nsLocalFileUnicode is gone from the project :-) Aside from no uconv
dep, this won't really pay until bug 95481 is done. With that, there will be no
conversion of any kind, except on pre-OS9 systems.
Comment on attachment 65780 [details] [diff] [review]
Patch so nsLocalFileMac doesn't use uconv for conversion

Awesome! Let's get a mac person to review this, but the concept looks great to
me.

sr=alecf
Attachment #65780 - Flags: superreview+
Regarding attachment 65780 [details] [diff] [review]:
Hrmm, ugly macro foo. The macros hide return logic etc, which is unfortunate. 
Isn't there a better way to do this?
OS: Windows 2000 → All
Hardware: PC → All
> Isn't there a better way to do this?
Yeah - expand the macros ;-) I based this on the code here:
http://lxr.mozilla.org/mozilla/source/xpcom/io/nsLocalFileUnicode.cpp#90, but
cut down the number of macros. I'll just get rid of them in a new patch.
Patch does two things:
(1) Gets rid of macros
(2) GetUnicodePath(), if HFSPlus is available, doesn't just call GetPath() and
convert that to Unicode. With HFSPlus, the extant portion of the path is gotten
directly in Unicode in the first place. So, I made a static utility function
HFSPlusGetRawPath() which is used by both versions of GetPath(). That involved
moving (but not changing) some code. The unicode conversion class had to be
moved up in the file. I moved the path parser object along with it.
I'm describing this because diff made a mess out of the diff. It's hard to tell
what happened. The old equivalent of diff in MPW had a great feature where you
could specify the minimum number of matching lines before it considered the two
to be "in sync" I guess diff doesn't have that and a simple matching brace
somewhere will cause it to end the non-matching run.
Attachment #65780 - Attachment is obsolete: true
Naoki, can you apply the patch in attachment 66286 [details] [diff] [review] and try it out on a Japanese
system?
I applied the patch to Classic build and ran on MacOS 9.1 JA.
I tested creating JA profile name, saved .html in JA file name, opened .html in
JA file name, I did not see problems.
Simon, based on Naoki's testing on OS 9.1 JA, I'd say it's good. Can you r=? The
only difference between 9.1 and OSX is, on OSX, the direct-from-unicode path is
used. But, that code has been in place in GetPath() for a while now so I'm
pretty sure of it.
Attached patch patch for nsLocalFileWin.cpp (obsolete) — Splinter Review
working off of conrad's example, I've done the windows work using the windows
system API's |wcstombs| and |mbstowcs|

Looking for reviewers...
nhotta, would you mind trying the windows patch, attachment 66916 [details] [diff] [review], for me, like
you did for conrad on the mac? thanks.

pete, do you have similar changes for unix?
Alec, yea I did this a while ago. I need to break it out of bug 94323 and
provide a patch here.

I should be able to get to this tonight.

--pete
Mozilla usually use WinAPI WideCharToMultiByte and MultiByteToWideChar instead of 
wcstombs and mbstowcs. Those WinAPI might have more coverage for charsets than
the runtime library functions, also that's more consistant with other parts of
the code in Mozilla.
I'll try the patch by setting the default system locale to JA on my Windows 2000.
Attached patch nsLocalFileUnix patch (obsolete) — Splinter Review
I would need a Japanese machine to test out the unix code as well. 

After applying the patch, you need to run autoconf, reconfigure and compile.

Thanks

--pete 
Attached patch windows fix using Win32 API (obsolete) — Splinter Review
Naoki - how does this look? I switched to the Win32 API (i.e.
WideCharToMultiByte and MultiByteToWideChar)
Attachment #66962 - Attachment is obsolete: true
Comment on attachment 66962 [details] [diff] [review]
nsLocalFileUnix patch

oops.. sorry I obsoleted pete's patch. ignore that.
Attachment #66962 - Attachment is obsolete: false
Attachment #66916 - Attachment is obsolete: true
Comment on attachment 66286 [details] [diff] [review]
Better patch for nsLocalFileMac.cpp

sr=alecf on the mac patch btw, assuming a mac person reviews it.
Attachment #66286 - Flags: superreview+
adding dveditz for super review on attachment 66977 [details] [diff] [review]
ccarlen or nhotta, do one of you mind reviewing it?
Comment on attachment 66977 [details] [diff] [review]
windows fix using Win32 API

Both CopyTo* and MoveTo* can take null as the name param. Check for that first.
cc'ing brendan to see if he can look at the unix patch.

--pete
Ok, 3rd time's the charm :)
Attachment #66977 - Attachment is obsolete: true
wcstombs and mbstowcs not working fine on my environment (Windows2000 default
charset set to JA). mbstowcs seems to infrate a byte to two bytes by casting and
wcstombs does the other way. They might need localized systems to work properly.
I will try the latest patch.
With the latest patch (id=66987), I was able to create and delete a profile in
JA name, also able to save a file in JA file name, tested on Windows 2000
default system set to JA.
Whiteboard: fix in hand, need reviews
Attachment #66962 - Attachment is obsolete: true
Comment on attachment 66962 [details] [diff] [review]
nsLocalFileUnix patch

>Index: configure.in

cls@seawood.org should review the configure.in change.

>Index: xpcom/io/nsLocalFileUnix.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/xpcom/io/nsLocalFileUnix.cpp,v
>retrieving revision 1.83
>diff -u -r1.83 nsLocalFileUnix.cpp
>--- xpcom/io/nsLocalFileUnix.cpp	2002/01/28 07:02:02	1.83
>+++ xpcom/io/nsLocalFileUnix.cpp	2002/01/29 22:26:19
>@@ -1536,3 +1536,168 @@
>     *result = file;
>     return NS_OK;
> }
>+
>+static nsresult 
>+UCS2toFS(const PRUnichar *aBuffer, char **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+    char str[BUFSIZ];                     
>+    sys_wchar_t wpath[BUFSIZ];           
>+    size_t len = 0;                     
>+    while (aBuffer[len] != L'\0') {    
>+        wpath[len] = aBuffer[len];          
>+        ++len;                         
>+    }

Argh, potential buffer overrun here, plus a big stack buffer (BUFSIZ is good
for stdio, who says it's right for this use?).	You should call nsCRT::strlen
on aBuffer, and use a sys_wchar_t smallbuf[64] if possible, else malloc.

Nit: lots of random trailing spaces on the above lines -- :%s/	*$// please!

>+    wpath[len] = L'\0';             
>+    if (wcstombs(str, wpath, BUFSIZ) < 0 )   
>+        return NS_ERROR_FAILURE;              
>+
>+    *aResult = nsCRT::strdup(str);

Is there no way to tell how much space the mbs version of the wcs will be?  Can
you allocate str dynamically to worst-case size, then realloc downward?  You're
going to have to allocate dynamically anyway.
>+static nsresult 
>+FStoUCS2(const char* aBuffer, PRUnichar **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+    sys_wchar_t wpath[BUFSIZ];
>+    if (mbstowcs(wpath, aBuffer, BUFSIZ) < 0)           
>+        return NS_ERROR_FAILURE;                           
>+    size_t len = 0;                                     
>+    PRUnichar wbuf[BUFSIZ];

Same comments here.

/be
Attachment #66962 - Attachment is obsolete: false
Attachment #66962 - Flags: needs-work+
What about this?

static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);
    size_t bufsiz = 64;
    if ((nsCRT::strlen(aBuffer) / 8) > bufsiz) {
        NS_ERROR("path is to large . . .\n");
        return NS_ERROR_OUT_OF_MEMORY;
    }
    sys_wchar_t *wpath = (sys_wchar_t *)nsMemory::Alloc(bufsiz);
    if (!wpath)
        return NS_ERROR_OUT_OF_MEMORY;
    size_t len = 0;
    while (aBuffer[len] != L'\0') {
        wpath[len] = aBuffer[len];
        ++len;
    }
    wpath[len] = L'\0';
    char *str = (char *)nsMemory::Alloc(bufsiz);
    if (!str)
        return NS_ERROR_OUT_OF_MEMORY;
    if ((len = wcstombs(str, wpath, bufsiz)) < 0 ) {
        nsMemory::Free(wpath);
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = nsCRT::strndup(str, len);
    if (!*aResult) {
        nsMemory::Free(wpath);
        nsMemory::Free(str);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    return NS_OK;
}
according to the manpage, the wcstombs and mbstowcs return the number of bytes
required.. take a look at my attachment 66916 [details] [diff] [review] for how I did it over there.

Since we need to allocate space for the result, we might as well get the right
number of bytes - I don't see a need for a temporary buffer.
alecf, don't we need a copy from PRUnichar array to sys_wchar_t array?  Those
types may not be the same size.  That's why I mentioned a small stack buffer to
avoid malloc'ing all the time -- but now that you mention it, we could autoconf
a macro by which to test whether PRUnichar is smaller than sys_wchar_t.

We could even do a runtime if-else test conditioned by (sizeof(PRUnichar) !=
sizeof(sys_wchar_t)) and let the compiler eliminate dead code (probably it would
warn).

BTW, your cited patch doesn't check for OOM.

Pete, I don't think this is right at all:

    size_t bufsiz = 64;
    if ((nsCRT::strlen(aBuffer) / 8) > bufsiz) {
        NS_ERROR("path is to large . . .\n");
        return NS_ERROR_OUT_OF_MEMORY;
    }

The while loop that follows is really a for loop, too.  I think the L'\0' is not
quite right when testing a PRUnichar, either.  And don't strndup str, it's
already allocated and ready to return to the caller.

Here's my version:

static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);

    sys_wchar_t *wpath, wbuf[64];
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_REINTERPRET_CAST(sys_wchar_t *, aBuffer);
    } else {
        PRUint32 length = nsCRT::strlen(aBuffer);
        size_t size = (length + 1) * sizeof(PRUnichar);
        if (size <= sizeof(wbuf)) {
            wpath = wbuf;
        } else {
            wpath = NS_REINTERPRET_CAS(sys_wchar_t *, nsMemory::Alloc(size));
            if (!wpath)
                return NS_ERROR_OUT_OF_MEMORY;
        }
        for (size_t i = 0; aBuffer[i] != 0; i++)
            wpath[i] = aBuffer[i];
        wpath[i] = L'\0';
    }

    size_t nbytes = wcstombs(nsnull, wpath, 0) + 1;
    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(nbytes));
    if (!str) {
        if (sizeof(PRUnichar) != sizeof(sys_wchar_t) && wpath != wbuf)
            nsMemory::Free(wpath);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    nbytes = wcstombs(str, wpath, nbytes);
    if (sizeof(PRUnichar) != sizeof(sys_wchar_t) && wpath != wbuf)
        nsMemory::Free(wpath);
    if (nbytes == size_t(-1)) {
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = str;
    return NS_OK;
}

/be
Comment on attachment 66987 [details] [diff] [review]
nsLocalFileWin.cpp v3, with null checks for CopyTo*/MoveTo

r=carlen
One suggestion though: Break statements like

+    if (!newName) return CopyTo(newParentDir, nsnull);

onto two lines. Just in case somebody might want to set a breakpoint on that
condition.
Attachment #66987 - Flags: review+
Brendan, I had to make some changes to get the code to compile, not warn and
then work, respectively.

> size_t nbytes = wcstombs(nsnull, wpath, 0) + 1;

Doesn't fly, you will always get "nbytes = 0".

You will notice i allocated a default buffer size for "str" which you are not
going to like.
Please advise on the best thing to do there.  
Not sure if you will like 'wpathIsAlloc' either . . .


static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);
    NS_ENSURE_ARG_POINTER(aResult);
    sys_wchar_t *wpath, wbuf[64];
    PRBool wpathIsAlloc = PR_FALSE;
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_REINTERPRET_CAST(sys_wchar_t *, (PRUnichar*)aBuffer);
    } else {
        PRUint32 length = nsCRT::strlen(aBuffer);
        size_t size = (length + 1) * sizeof(PRUnichar);
        if (size <= sizeof(wbuf)) {
            wpath = wbuf;
        } else {
            wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(size));
            wpathIsAlloc = PR_TRUE;
            if (!wpath)
                return NS_ERROR_OUT_OF_MEMORY;
        }
        size_t i;
        for (i = 0; aBuffer[i] != 0; i++)
            wpath[i] = aBuffer[i];
        wpath[i] = L'\0';
    }

    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(64));
    if (!str) {
        if (wpathIsAlloc)
            nsMemory::Free(wpath);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    size_t nbytes = wcstombs(str, wpath, 64);
    nbytes = wcstombs(str, wpath, nbytes);
    if (wpathIsAlloc)
        nsMemory::Free(wpath);
    if (nbytes == size_t(-1)) {
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = str;
    return NS_OK;
}
pete, wcstombs(3) says that if you pass null for the first (dest) param, the
third (n) param is ignored, and there is no limit on the returned byte count,
which can be used to allocate.  So I don't see why you need to pass 64 to the
first call, nor does that seem big enough in general.  The code alecf wrote does
what I sketched, too (attachment 66916 [details] [diff] [review]). 

Thanks for fixing my code, I was hacking in the bugzilla comment input textarea,
always dangerous.  I don't mind the wpathIsAlloc boolean, thought of that myself
but decided the uglier condition was ok to repeat, as the first clause is a
compile-time constant, and the wpath != wbuf is the important bit.  But either
is ok (except maybe call it wpathWasAlloced for grammatical purity of essence?).

/be
Oops also on the NS_REINTERPRET_CAST(sys_wchar_t *, aBuffer) where aBuffer was
const PRUnichar * -- but pete, you can't fix that with an old-style C cast and
not run the risk of someone passing a string constant in and this code crashing
trying to store into readonly memory.  I made the aBuffer param a pointer to
non-const here.

I also got rid of the NS_ENSURE_ARG_POINTER(aResult), it seems silly especially
in a static function.  The static-ness lets anyone inspect nsLocalFileUnix.cpp
only, in order to find all calls and see that they follow the XPCOM rules.  In
that vein, I thought it was ok to eliminate the const from aBuffer's type -- it
is the only way to avoid a copy when sizeof(sys_wchar_t) == sizeof(PRUnichar).

static nsresult
UCS2toFS(PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);

    sys_wchar_t *wpath, wbuf[64];
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_REINTERPRET_CAST(sys_wchar_t *, aBuffer);
    } else {
        PRUint32 length = nsCRT::strlen(aBuffer);
        size_t size = (length + 1) * sizeof(PRUnichar);
        if (size <= sizeof(wbuf)) {
            wpath = wbuf;
        } else {
            wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(size));
            if (!wpath)
                return NS_ERROR_OUT_OF_MEMORY;
        }
        size_t i;
        for (i = 0; aBuffer[i] != 0; i++)
            wpath[i] = aBuffer[i];
        wpath[i] = L'\0';
    }

    size_t nbytes = wcstombs(nsnull, wpath, 0) + 1;
    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(nbytes));
    if (!str) {
        if (sizeof(PRUnichar) != sizeof(sys_wchar_t) && wpath != wbuf)
            nsMemory::Free(wpath);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    nbytes = wcstombs(str, wpath, nbytes);
    if (sizeof(PRUnichar) != sizeof(sys_wchar_t) && wpath != wbuf)
        nsMemory::Free(wpath);
    if (nbytes == size_t(-1)) {
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = str;
    return NS_OK;
}

/be
Brendan, on linux it says that.  ;-)

> If  dest  is  NULL, n is ignored, and the conversion proceeds as above, except
that
> the converted bytes are not written out to memory, and that no length limit
exists.

Not on FreeBSD. Can't tell you about other nX flavors  . . .

--pete       
Brendan, can you show me how to crash it using (PRUnichar *) cast?
I can't get it to crash with the cast there.  It sucks having the compiler
warning there. 

But do we really need to use NS_REINTERPRET_CAST?

Can't we just do something like this?

    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
         wpath = (sys_wchar_t *) aBuffer;

and just make the copy since they are the same types in this particular case anyway?

--pete

Comment on attachment 66286 [details] [diff] [review]
Better patch for nsLocalFileMac.cpp

r=sfraser. But what will all this unicode conversion stuff do to our startup
performance?
Attachment #66286 - Flags: review+
Pete, I'm wrong: you should use NS_CONST_CAST just for C++ purity and leave the
aBuffer parameter type const PRUnichar *:

static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);

    sys_wchar_t *wpath, wbuf[64];
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_CONST_CAST(sys_wchar_t *, aBuffer);
    } else {
 . . .

the NS_ENSURE_ARG could go to, for the reason I gave against
NS_ENSURE_ARG_POINTER(aResult).

/be
Is FreeBSD's wcstombs really busted?  News to alecf, I bet!

/be
simon: we're already paying that price right now, in fact we have the overhead
of going through our own Unicode decoders which are likely to be just slightly
slower because they are designed to handle arbitrary charset decoding/encoding. 
Yup, wcstombs appears to be busted on FreeBSD.

You can't do this:

len = wcstombs(0, wcs, 0);

len is -1

--pete
Here's the FreeBSD bug tracking the issue.

http://www.freebsd.org/cgi/query-pr.cgi?pr=17694

> This is a nonstandard extension.  It is not present in C89 or in at
> least the n869 draft of C99.

Interseting . . .

--pete
Brendan, this looks pretty FreeBSD specific, so i can just ifdef it for now.

--pete
From the log, it also sounds like ONLY FreeBSD is broken. (there is mention of
HP-UX, AIX, Irix, Solaris, Linux, and Tru64)

Can we have a configure test to see if this function is broken, and then have
some kind of temporary buffer only for the broken platform?
Very weird.  I just looked in the C99 standard, and it says nothing about 
counting the number of characters required if NULL is passed in as the 
destination pointer.

(just confirming Bruce Evans' claim)
Hence, 

> MSVC++ 6.0 does it too (might be a hint where it comes from :-( ).

heh

--pete
size_t wcstombs(char *s, const wchar_t *pwcs, size_t n);

Converts a sequence of codes that correspond to multibyte characters from the
array pointed to by pwcs into a sequence of multibyte characters that begins in
the initial shift state, and stores these multibyte characters into the array
pointed to by s. The conversion stops if a multibyte character would exceed the
limit of n total bytes or if a null character is stored.
Each code is converted as if by a call to wctomb , except that the shift state
of wctomb is not affected.
If a code is encountered that does not correspond to a valid multibyte
character, the wcstombs function returns (size_t) - 1 . Otherwise, it returns
the number of bytes modified, not including a terminating null character, if any. 
What about this?

static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);
    sys_wchar_t *wpath, wbuf[64];
    PRBool wpathWasAlloced = PR_FALSE;
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_CONST_CAST(sys_wchar_t *, (sys_wchar_t *)aBuffer);
    } else {
        PRUint32 length = nsCRT::strlen(aBuffer);
        size_t size = (length + 1) * sizeof(PRUnichar);
        if (size <= sizeof(wbuf)) {
            wpath = wbuf;
        } else {
            wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(size));
            wpathWasAlloced = PR_TRUE;
            if (!wpath)
                return NS_ERROR_OUT_OF_MEMORY;
        }
        size_t i;
        for (i = 0; aBuffer[i] != 0; i++)
            wpath[i] = aBuffer[i];
        wpath[i] = L'\0';
    }
#if defined(__FreeBSD__)
    char tmpbuf[64];
    size_t nbytes = wcstombs(tmpbuf, wpath, 64);
#else
    size_t nbytes = wcstombs(nsnull, wpath, 0);
#endif
    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(nbytes));
    if (!str) {
        if (wpathWasAlloced)
            nsMemory::Free(wpath);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    nbytes = wcstombs(str, wpath, nbytes);
    if (wpathWasAlloced)
        nsMemory::Free(wpath);
    if (nbytes == size_t(-1)) {
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = str;
    return NS_OK;
}
Comment on attachment 66987 [details] [diff] [review]
nsLocalFileWin.cpp v3, with null checks for CopyTo*/MoveTo

>+    // add 1 for null termination
>+    size_t chars = ::WideCharToMultiByte(CP_ACP, 0,
>+                                         aBuffer, -1,
>+                                         NULL, 0, NULL, NULL) + 1;

Since you used -1 to have the routine calculate the length then it'll
include the null terminator.

>+    if (chars == 1)
>+        return NS_ERROR_FAILURE;

Therefore this should be a check for 0

>+    *aResult = (char*)nsMemory::Alloc(chars * sizeof(char));
>+    chars = ::WideCharToMultiByte(CP_ACP, 0,
>+                                  aBuffer, -1,
>+                                  *aResult, chars, NULL, NULL);

Have you tested that this works if the Alloc fails? Some msvcrt routines
(strcpy, etc) crash when given null pointers, although the docs do say
this routine checks that the two buffers aren't the same so maybe it
does handle null OK.

>+    if (chars >= 0)
>+        (*aResult)[chars] = '\0'; // null terminate
>+
>+    return NS_OK;

Since you've used -1 to get the length the null terminator is included
in the conversion. However, if chars *is* 0 you've got a problem so do
you still want to return NS_OK?

>+static nsresult FStoUCS2(const char* aBuffer, PRUnichar **aResult)

Ditto the above for this version.
Ok, 4th time's the charm? I had left in the bogus null-termination logic from
when I was using wcstombs - whoops! Anyway, this patch:
- checks its arguments for null
- checks memory allocation failure
- checks failure of the conversion routines

Reviews?
Attachment #66962 - Attachment is obsolete: true
Attachment #66987 - Attachment is obsolete: true
Attachment #67303 - Flags: superreview+
Comment on attachment 67303 [details] [diff] [review]
nsLocalFileWin.cpp v4, with better error checking

sr=dveditz
thanks guys - windows and mac have been checked in. Now we just need unix and
we're done! I'll try to get to the review today.
I can finish the unix patch if the last UCS2toFS method posted is the way to go
here.

--pete
pete, what if the string at aBuffer has 64 or more characters in it?

#if defined(__FreeBSD__)
    char tmpbuf[64];
    size_t nbytes = wcstombs(tmpbuf, wpath, 64);
#else
 . . .

Won't freebsd versions of Mozilla truncate the string brutally?

/be
Brendan, originally i was using BUFSIZ and you said it was to big. 

This is only a temporary buffer so we can allocate the real one after.

I'll go back to using BUFSIZ.

--pete
Pete, my point is that you are bounding the string conversion by the size of the
stack buffer.  My point against BUFSIZ (it's good for stdio i/o buffering, but
what does that have to do with your intended use?) still stands -- plus, I am
saying that you should find a way to have no arbitrary bound on the string
length, unless we use this only for strings that have some bound already -- such
as PATH_MAX (scaled by sizeof(sys_wchar_t) of course).

Is there such a bound on the strings that come in via aBuffer?  If not, how do
other FreeBSD programs use wcstombs without imposing arbitrary limits?  Do they
use wcstomb instead, iterating through an arbitrary length string using a fixed
size buffer?

/be
Attachment #67000 - Attachment is obsolete: true
Attached patch This would be the correct patch (obsolete) — Splinter Review
Attachment #68255 - Attachment is obsolete: true
Comment on attachment 68256 [details] [diff] [review]
This would be the correct patch

Retracting patch. I have some more things to clean fix.
Attachment #68256 - Attachment is obsolete: true
with the switch to gmake on win32, we re-gained this dependency, and started
building nsLocalFileUnicode on Windows again. This patch makes that and the
REQUIRES=uconv explicit on OS/2 and Unix.
Blocks: 117359
Comment on attachment 68380 [details] [diff] [review]
Full patch for unix - please review

how about a diff -bw so we can actually see what changed? :)
Attached patch diff -ubw so alec can read it (obsolete) — Splinter Review
cc'ing chris seawood so he can look over the changes to configure.in

Chris, can you look at the autoconf addition in attachment 68399 [details] [diff] [review].

Thanks

--pete
Comment on attachment 68399 [details] [diff] [review]
diff -ubw so alec can read it

Since you're not actually testing additional CFLAGS, you should use
AC_LANG_CPLUSPLUS & AC_LANG_C instead of using _SAVE_CC.  Fix that && r=cls.
Attachment #68399 - Flags: needs-work+
Attachment #68380 - Attachment is obsolete: true
Attachment #68399 - Attachment is obsolete: true
Attachment #68663 - Attachment is obsolete: true
Comment on attachment 68712 [details] [diff] [review]
changing bad variable name use 'aBufSiz' to 'bufLen' 

>+    // nbytes is simply the amount of characters in aBuffer + 1
>+    size_t nbytes = nsCRT::strlen(aBuffer)+1;
>+    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(nbytes));
>+    if (!str) {
>+        if (wpathWasAlloced)
>+            nsMemory::Free(wpath);
>+        return NS_ERROR_OUT_OF_MEMORY;
>+    }
>+    nbytes = wcstombs(str, wpath, nbytes);

But a multibyte string may need 1, 2, or 3 bytes per PRUnichar, so you still
may be underallocating str.  If you can't use the wcstombs extension (where
passing 0 in with a NULL result buffer causes it to return the required
nbytes), then I repeat that you might have to (might as well) use wcstomb
iteratively.

Or maybe you could use the worst case size.  Is that 3 * the number of
characters?  Or could it be more?  We need advice from i18n folks.

/be
Attachment #68712 - Flags: needs-work+
> Or maybe you could use the worst case size.  Is that 3 * the number of
characters?  Or could it be more?  We need advice from i18n folks.

I think PRUnichars are always double byte. 

I think we would want to double the amount of chars present not triple it.

size_t nbytes = (nsCRT::strlen(aBuffer)+1) * 2;

So if there are 10 chars present, we want to allocate a 20 byte max size.
Two bytes per char.

This is wrong only if PRUnichars can be more than 2 bytes on some systems. 

--pete






Sorry, we are allocating for sys_wchar not PRUnichar.

Ok, same is true just need to allocate for the variable system wchar size.

--pete
PRUnichar is always uint16, or two bytes.  The problem is that an "mbs"
(multi-byte string) may need three bytes to represent some PRUnichars, but only
two or one for others.  S-JIS is one such charset encoding.

The worst case for a charset encoding that can express up to 65536 characters
has to be more than two bytes for any encoding that uses a variable number of
bytes per character, because there needs to be at least one bit in the first (or
only) byte that distinguishes a one-byte code from a two-byte code -- therefore
not all two-byte codes are available to represent the high-byte-non-zero part of
Unicode.  So three is a minimum multiplier for any charset that can express up
to 65536 characters.

Cc'ing ftang, but also: pete, why not use wcstomb/wcsrtombs (I noticed wcstomb
from your FreeBSD man page; I just found the similar-seeming wcsrtombs on my RH
7.1 Linux system) iteratively?

/be
Yea, i'll iterate through the chars and find out how many bytes they are using
wctomb.

I actually had this implemented already and thought it was unneecessary code
because I kept getting the same number as strlen. 

Now i realize, thats because i have a single byte charset i'm using here to test.

I'll post a patch tomorrow.

Thanks

--pete 

Attached patch using wctomb to allocate str (obsolete) — Splinter Review
Attachment #68712 - Attachment is obsolete: true
Attached patch adding nbytes test for failure (obsolete) — Splinter Review
Attachment #68831 - Attachment is obsolete: true
> for(size_t i = 0; wpath[i] != L'\0'; i++) 

Brendan, ignore the lack of a space after the for i just fixed it.

for (size_t i = 0; wpath[i] != L'\0'; i++) 

--pete
Attached patch correct patch (obsolete) — Splinter Review
Attachment #68858 - Attachment is obsolete: true
Perhaps someone can apply this patch on a linux japanese box and we can move
closer to landing this.

Testing various flavors of n'x would be ideal before a check-in.

--pete
Comment on attachment 68860 [details] [diff] [review]
correct patch

actually, I think tmpStr is going to have to be even bigger. Linux defines it
as MB_CUR_MAX.. though I can't tell how much that is. UTF8 has up to 6 byte
characters, not sure about other charsets.

the linux manpage also states that "at most n bytes are written" - do we know
that "wctomb" will return 1 if we pass in a null byte? If not, I think we need
to adjust for the terminating null by incrementing nbytes.

I'm also not sure of the value of 
if (nbytes < 0) 
because nbytes is a size_t (which is probably unsigned) - won't this statement
always be false?
     If mbchar is NULL, the mblen(), mbtowc() and wctomb() functions return
     nonzero if shift states are supported, zero otherwise.  If mbchar is
     valid, then these functions return the number of bytes processed in
     mbchar, or -1 if no multibyte character could be recognized or converted.


So 

if (nbytes < 0) 

Is valid . . .

MB_CUR_MAX is 4 bytes on freebsd i think it can be either 2 or 4 bytes on linux.

tmpStr needs to be big enough for sys_wchar_t which the max value is 4 byte, 2
byte on some linux systems, so that should be fine as well.

I beleive we are still cool w/ this patch.

Apply and fly . . .

The true test is to run it on a non-single byte charset machine.


--pete


size_t is unsigned, and any size_t variable comparison < 0 will never be true
("degenerate unsigned comparison").

/be
Right i need to do:

if (nbytes == size_t(-1))

That aside, i think we can test out this code. I am blowing out my tree at the
moment so i will post a revised patch later.


--pete
BTW: I forgot to mention, if you apply this patch you will need to run autoconf
and reconfigure.

--pete
Attachment #68860 - Attachment is obsolete: true
pete: but what about using MB_CUR_MAX? What if the system charset has 6
bytes-per-char (like UTF8)

Yea, the compiler warns when i use MB_CUR_MAX.

warning: ANSI C++ forbids variable-size array `tmpStr'

How about i just do something like this:

// set a 6 byte max size
#define MB_CHAR_MAX 6

so it is clear why we are using 6 here.

BTW: I thought the largest size of wchar_t on unix is 4 bytes . . . .

-pete

I just looked up why that was happening. The problem is that MB_CUR_MAX actually
depends on your current, runtime locale....on linux it is defined as
(__ctype_get_mb_cur_max ())

So maybe you can just make it very large (like 16) and NS_ASSERT that MB_CUR_MAX
> 16.. I just don't think that 4 is sufficient...you gotta hope that 16 would be
enough, though :) Besides, stack is cheap - there's no excess runtime cost to
using a 16-byte buffer vs. a 4 or 6 byte buffer.

Also, this is MB cur max, not WC cur max - this is the maximum number of bytes
used to REPRESENT a single wchar_t in the current, probably multi-byte charset... 
Attached patch we should be getting close (obsolete) — Splinter Review
Attachment #69601 - Attachment is obsolete: true
Comment on attachment 69733 [details] [diff] [review]
we should be getting close

a few more things:
1) in UCS2toFS, explain why we're doing the for loop to determine the wctomb
length
2) in the assertion, explain why this is bad, so if someone hits the assertion,
they understand why, and can at least guess at a fix
3) add some whitespace, and lots of it! :) There isn't a single line of
whitespace in any of these functions, and it it just makes it hard to read..
you need whitespace between every significant block of code.

4) ack! according to the man page, mbstowcs returns the number of CHARACTERS,
not the number of bytes.. which means you're probably only allocating half as
much as you want in FStoUCS2 - that should clean up this:
+    if (sizeof(PRUnichar) == sysWideCharSize) {
+	 wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes));
+    } else {
+	 wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes *
sizeof(PRUnichar)));
+    }


because we know that we want to allocate nchars * sizeof(sys_wchar_t), no
matter what.
Attachment #69733 - Flags: needs-work+
I hope the compilers we use can figure out that the if condition is a compile
time constant:

+    if (sizeof(PRUnichar) == sysWideCharSize) {
+        wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes));
+    } else {
+        wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes *
sizeof(PRUnichar)));
+    }

But, size_t sysWideCharSize should be const, not a variable.  I would rather you
used a #define, actually -- or just spell out the sizeof(sys_wchar_t), it's not
that long or ugly.

/be
Attached patch a little bit closer . . . (obsolete) — Splinter Review
Attachment #69733 - Attachment is obsolete: true
Attachment #70443 - Attachment is obsolete: true
Comment on attachment 70450 [details] [diff] [review]
removed final size_t < 0 comparison 

ok, I find the use of "nbytes" to continue to be a little confusing (would have
preferred "nchars") but to each his own :)
I'll give my tentative r=alecf here, under the assumption that brendan will be
the sr=
Attachment #70450 - Flags: review+
Comment on attachment 70450 [details] [diff] [review]
removed final size_t < 0 comparison 

>+    size_t nbytes = 1; 
>+NS_ASSERTION(MB_CUR_MAX <= 16, "this system has exceeded the allocated 16 byte mbchar max");

Nit: indent assertions as you would any other code -- if overlong line results,
break after , in actual param list.

>+
>+    /** 
>+     *  here we are determining the number of bytes processed in
>+     *  tmpStr and increment nbytes to properly allocate for str
>+     */

Nit: doc-comment in a .cpp file is wasted effort, I think.

>+static nsresult
>+FStoUCS2(const char* aBuffer, PRUnichar **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+
>+    if (nsCRT::strlen(aBuffer) == 0)
>+        return NS_OK;

Don't call nsCRT::strlen, it's going away soon.  Don't call strlen just to test
for an empty string, it can take too long on non-empty strings.  Just test if
*aBuffer == '\0'.

>+
>+    size_t nbytes = (nsCRT::strlen(aBuffer)+1) * sizeof(sys_wchar_t);

Nit: spaces around + ?

>+
>+    // add space for wchar null terminator
>+    nbytes += sizeof(sys_wchar_t);

Wait, didn't you just do that already with the +1 in the nbytes initializer
factor that's multiplied by sizeof(sys_wchar_t)?

>+    sys_wchar_t *wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(nbytes));
>+
>+    if (!wpath)
>+        return NS_ERROR_OUT_OF_MEMORY;
>+
>+    nbytes = mbstowcs(wpath, aBuffer, nbytes);
>+
>+    if (nbytes == size_t(-1)) {
>+        nsMemory::Free(wpath);
>+        return NS_ERROR_FAILURE;
>+    }
>+
>+    size_t len = 0;
>+    PRUnichar *wbuf;
>+
>+    // increment one for terminating null wide character
>+    ++nbytes;

Why not use + 1 in as you did earlier?	No point in ++'ing nbytes if you're
going to overwrite it here:

>+    nbytes = nbytes * sizeof(sys_wchar_t);

Do use *= if possible, but here I agree with alecf: the nbytes usage from
mbstowcs's call till here should be a new nchars var.

>+    wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes));

You don't need to set nbytes if you do the add and multiply in the actual param
to Alloc.

alecf: some OSes use UTF-8, not our UCS-2 simple form, for filenames.  Isn't
Linux moving that way?

/be
Attachment #70450 - Flags: needs-work+
brendan, i think you're wrong about javadoc being useless in cpp files, 
http://unstable.elemental.com/mozilla/build/latest/mozilla/xpcom/dox/classnsDeq
ue.html is the url i'd watch, note that it says at the bottom:

The documentation for this class was generated from the following files:
         nsDeque.h
         nsDeque.cpp

I don't know how long it takes for unstable to update, but as of now it doesn't 
have the updated deque comments.
timeless: I wrote that doc comments in .cpp files were "wasted effort", which is
not to say "useless".  Not only will few people check
unstable.elemental.com/mozilla and read them, the particular one that pete wrote
seems to me just a plain old comment, not worthy of extraction out of context
into some kind of documentation.  It uses no @ keywords, defines no
interface(s), etc.  Let's please not start using /** ... */ everywhere a major
C-style comment is appropriate -- and pete, consider using // comments even for
multi-liners -- this is C++ after all.

/be
I should have time to clean up this patch tomorrow.

Brendan, i am being consistent w/ all of the other multi-line comments in
nsLocalFileUnix.cpp that i cleaned up way back when.

I favor this style, i find it easier to read and cleaner looking. 

To be honest, in a perfect world all multi-line cpp comments should be uniform.
This particular file is uniform.

--pete
Attachment #70450 - Attachment is obsolete: true
Comment on attachment 70803 [details] [diff] [review]
updated patch w/ Brendan's suggested changes

transferring my r=alecf
Attachment #70803 - Flags: review+
Keywords: topembed
Perhaps since we have review and are now pending Brendans sr, it might be a good
time to apply this to a linux box setting the default system locale to JA.

--pete


*** Bug 41384 has been marked as a duplicate of this bug. ***
Keywords: mozilla1.0+
naoki, is there any chance you can test this patch on a japanese unix system
(Linux should be fine)
I don't have JA system. I think IQA can test if the binary is available.
Ok, i will produce a linux binary soon.

Alecf, we want to land this for 1.0 right?


--pete
yes! we definitely want to land in 1.0
Alec, do you have any API changes to either nsIFile or nsILocalFile, of so,
please cite and make this bug a blocker of  99160
No longer blocks: 66759, 117359
I don't, but don't remove those bug dependencies!
Blocks: 66759
where are we with this? I'm moving to 1.0 to keep it on the radar, but we've
really got to land this to get some time testing. brendan, we really need your
final review.
Target Milestone: mozilla0.9.9 → mozilla1.0
I haven't had time to build a linux binary for testing.
I'll try to do it today.


--pete
Pete, do you know what a doc-comment is?  This is one, and I do not think it
should be:

+    /**
+     *  here we are determining the number of bytes processed in
+     *  tmpStr and increment nbytes to properly allocate for str
+     */

Just remove the second * and it won't be a doc-comment (and you probably won't
want to have two spaces after the third, etc. stars).  Uniformity of comment
style is fine, but don't write doc-comments if you aren't writing extractable
javadoc.

/be
Comment on attachment 70803 [details] [diff] [review]
updated patch w/ Brendan's suggested changes

>@@ -1526,8 +1526,10 @@
> NS_NewLocalFile(const char *path, PRBool followSymlinks, nsILocalFile **result)
> {
>     nsLocalFile *file = new nsLocalFile();
>+
>     if (!file)
>         return NS_ERROR_OUT_OF_MEMORY;
>+
>     NS_ADDREF(file);
> 
>     if (path) {
>@@ -1537,6 +1539,302 @@
>             return rv;
>         }
>     }
>+
>     *result = file;
>+
>+    return NS_OK;
>+}

Speaking of uniformity, other functions below do not throw extra newlines
around such statements, and I don't see the point here.  Please minimize change
unless you're fixing some real readability problem (I don't see one here).

>+static nsresult
>+UCS2toFS(const PRUnichar *aBuffer, char **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+
>+    if (nsCRT::strlen(aBuffer) == 0)

I think I've said this before: don't call strlen to handle an empty string, it
wastes (possibly many) cycles on non-empty string cases where you need only
test (*aBuffer == '\0').

>+        return NS_OK;
>+
>+    sys_wchar_t *wpath;
>+    PRBool wpathWasAlloced = PR_FALSE;
>+
>+    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
>+        wpath = NS_CONST_CAST(sys_wchar_t *, (sys_wchar_t *)aBuffer);
>+    } else {
>+        // get the number of chars in aBuffer
>+        size_t bufLen = (nsCRT::strlen(aBuffer) + 1);
>+        // allocate number of chars * size of sys_wchar_t
>+        wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(bufLen * sizeof(sys_wchar_t)));
>+        wpathWasAlloced = PR_TRUE;

Now, I see a readability problem (minor, fix if you agree): why not add a
newline to make an empty line before the // allocate number of chars....

>+        /**
>+         *  copy each char from aBuffer into sys_wchar_t buffer
>+         *  so we can pass the system wchar_t to wcstombs
>+         */

Another doc-comment that should be a regular C-style major comment.

>+    if (wpathWasAlloced)
>+        nsMemory::Free(wpath);
>+
>+    if (nbytes == size_t(-1)) {
>+        nsMemory::Free(str);
>+        return NS_ERROR_FAILURE;
>+    }
>+
>+    *aResult = str;
>+
>+    return NS_OK;
>+}

I don't see the point in the extra newline before the final return, esp. in
light of earlier style.

>+
>+static nsresult
>+FStoUCS2(const char* aBuffer, PRUnichar **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+
>+    if (aBuffer == '\0')

Cool, but the same technique applies to the const PRUnichar* aBuffer case.

>+        return NS_OK;
>+
>+    size_t nbytes = (nsCRT::strlen(aBuffer) + 1) * sizeof(sys_wchar_t);
>+
>+    // add space for wchar null terminator
>+    // nbytes += sizeof(sys_wchar_t);
>+    sys_wchar_t *wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(nbytes));
>+
>+    if (!wpath)
>+        return NS_ERROR_OUT_OF_MEMORY;
>+
>+    size_t nchars = mbstowcs(wpath, aBuffer, nbytes);
>+
>+    if (nchars == size_t(-1)) {
>+        nsMemory::Free(wpath);
>+        return NS_ERROR_FAILURE;
>+    }
>+
>+    size_t len = 0;
>+    PRUnichar *wbuf;
>+
>+    wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc((nchars + 1) * sizeof(sys_wchar_t)));

Initialize wbuf at its declaration.

Test for out of memory (null wbuf after it's set) before using wbuf.

>+
>+    // copy each wchar_t to a PRUnichar
>+    while (wpath[len] != L'\0') {
>+        wbuf[len] = wpath[len];
>+        ++len;
>+    }
>+
>+    wbuf[len] = 0;
>+    nsMemory::Free(wpath);
>+    *aResult = wbuf;
>+
>+    if (!*aResult)
>+        return NS_ERROR_OUT_OF_MEMORY;

Too late, the code above (any use or set of wbuf[len]) would have crashed on
null wbuf, if OOM.

>+
>+    return NS_OK;
>+}
>+
>+NS_IMETHODIMP
>+nsLocalFile::InitWithUnicodePath(const PRUnichar *filePath)
>+{
>+
>+    NS_ENSURE_ARG(filePath);
>+
>+    if (nsCRT::strlen(filePath) == 0)

Here we go again... :-(.  Please scour the code for all such bogus strlen
calls.

>+        return NS_ERROR_INVALID_ARG;
>+
>+    nsXPIDLCString tmp;
>+    nsresult rv = UCS2toFS(filePath, getter_Copies(tmp));
>+
>+    if (NS_SUCCEEDED(rv))
>+        return InitWithPath(tmp);
>+
>+    return rv;

Don't test success and overindent the normal case.  Use NS_FAILED to decide to
early-return rv, then fall into a least-indented call forwarding to the
InitWithPath method.

>+}
>+
>+NS_IMETHODIMP
>+nsLocalFile::AppendUnicode(const PRUnichar *node)
>+{
>+    NS_ENSURE_ARG(node);
>+
>+    if (nsCRT::strlen(node) == 0)

Ditto.

>+        return NS_OK;
>+
>+    nsXPIDLCString tmp;
>+    nsresult rv = UCS2toFS(node, getter_Copies(tmp));
>+
>+    if (NS_SUCCEEDED(rv))
>+        return Append(tmp);
>+
>+    return rv;

Ditto2.

>+}
>+
>+NS_IMETHODIMP
>+nsLocalFile::AppendRelativeUnicodePath(const PRUnichar *node)
>+{
>+    NS_ENSURE_ARG(node);
>+
>+    if (nsCRT::strlen(node) == 0)

etc. -- I'll let you find the rest.

>+        return NS_OK;
>+
>+    nsXPIDLCString tmp;
>+
>+    if (tmp.IsEmpty())
>+        return NS_OK;
>+
>+    nsresult rv = UCS2toFS(node, getter_Copies(tmp));
>+
>+    if (NS_SUCCEEDED(rv))
>+        return AppendRelativePath(tmp);
>+
>+    return rv;

etc.2

>+}

More below this, not individually cited here.

/be
Attachment #70803 - Flags: needs-work+
>>+static nsresult
>>+FStoUCS2(const char* aBuffer, PRUnichar **aResult)
>>+{
>>+    NS_ENSURE_ARG(aBuffer);
>>+
>>+    if (aBuffer == '\0')
>
>Cool, but the same technique applies to the const PRUnichar* aBuffer case.

Uncool -- shaver pointed out what my bleary eyes missed: no * before aBuffer,
making this a test of whether aBuffer is null!  Please fix that, and notice any
warnings (there should have been one here, I think -- your compiler mileage may
vary).

/be
CCing mkaply for OS2. This is in on Windows & Mac, almost there on Unix. Once we
have OS/2, the job is done.
the man pages (redhat 7.2) says that wcstombs and mbstowcs are not thread safe.
 this patch needs to use a PRLock to protect libc.  the man pages also make
reference to wcsrtombs and mbsrtowcs, which are reentrant versions of these
functions.
the current impl is also not threadsafe... we're probably just lucky right now
that no one is calling any of the unicode string methods on background threads.

if we switch to an UTF-8 API then these conversion methods will be run much more
often and most certainly from background threads (e.g., the file transport thread).
oh, and if we're going to put a lock around these calls, then why not allocate a
single global buffer of size (MAX_PATH * sizeof(wchar_t)) for this?
XPInstall runs on a separate thread and obviously does a lot of file stuff. What
conditions should I be testing to see if we are, in fact, in danger of threading
problems?
dveditz: if xpinstall calls any of the unicode string methods on a background
thread, you'd be in trouble.
The threading issue is on Unix only, right? The Windows code looks safe enough.

XPInstall currently calls the narrow nsILocalFile methods, but we've got
localization problems and were going to be converting to the Unicode forms. Are
the threading issues going to get fixed?
the threading issues must be fixed... also, it looks like OS/2 uses
nsLocalFileUnicode, so it'll need fixing as well... or OS/2 will need its own
customized unicode <-> fscharset conversion code.
topembed-
this is good for stand alone xpcom.
- by triage team (chofmann, cathleen, marcia)
Keywords: topembedembed, topembed-
So you want me to use operating system APIs to do my Unicode conversions in 
xpcom?

I would prefer not to. As it stands I was going to remove my use of operating 
system APIs in widget and gfx and switch to using Mozilla to get consistent 
conversions.

If we use OS APIs in some places and Mozilla APIs in others, you can end up with 
inconsistent conversion of Unicode, in particular with Japanese.

If you want me to use my OS APIs, I need someone to help me architect the right 
place to put them because right now I have custom unicode conversion functions 
in widget and gfx and I would prefer not to add a third implementation into 
xpcom (we don't have simple APIs to do this, we have to write some functions) - 
see:

http://lxr.mozilla.org/seamonkey/source/gfx/src/os2/nsGfxDefs.cpp#164

and

http://lxr.mozilla.org/seamonkey/source/gfx/src/os2/nsGfxDefs.cpp#242
I'll have an updated patch by weeks end.

--pete
The compiler flag -fshort-wchar causes wcstombs to fail every time on linux.
If i remove it, it works fine. 

Seawood, what do you suggest we do here? This is an autoconf problem.

--pete
I'm going to defer to Akkana & dbaron as to the importance of using
-fshort-wchar.  This doesn't affect our release builds as egcs 1.1.2 doesn't
support the flag but we may want to just beef up the autoconf test with an
explicit wcstombs test.

-fshort-wchar is what allows NS_LITERAL_STRING to do its work at compile time
rather than run time.  That's becoming more and more important for performance
as we use it more.

(I'm not sure why a system API using wchar_t would be useful anyway when wchar_t
is 32-bit and PRUnichar is 16-bit, but I haven't read the whole bug.)
so i was curious about this wchar_t thing and i tried a little test program with
a call to mbstowcs.  i compiled the program with and without -fshort-wchar, and
in both cases it worked correctly.

here's the sample program:

  #include <wchar.h>
  #include <stdio.h>

  int main(int argc, char **argv)
  {
    wchar_t buf[256], *p;
    size_t rv;

    printf("sizeof(wchar_t) = %u\n", sizeof(wchar_t));

    rv = mbstowcs(buf, argv[1], sizeof(buf));

    p = buf;
    while (*p) {
      printf("0x%08x\n", (unsigned int) *p);
      p++;
    }
    return 0;
  }

so, under linux at least it looks like we could happily set -fshort-wchar and
still call the system wchar API.  how'd they do that anyways?! ;-)
So now you're gonna to try and make a lier out of me eh tough guy?

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>

int
main ()
{
  return printf("nbytes = %d\n", wcstombs(0, L"test", 0));
}


# compiled with -fshort-wchar

$ ./a.out 
nbytes = -1

 compiled w/out -fshort-wchar

$ ./a.out 
nbytes = 4


$ uname -a
Linux LinuxBox 2.4.9-13SGI_XFS_1.0.2 #1 Thu Nov 15 11:38:50 CST 2001 i686 unknown

--pete

pete: i'm running glibc 2.2.4 that comes with redhat 7.2... what version of
glibc are you testing?  (maybe it's a new feature of glibc)
pete: actually, i just tried your example, and i get the same results?!  now i'm
baffled by the results from my test and yours.
Darin, you tested 'mbstowcs' *not* 'wcstombs' which is the method I said was
failing.

The sticky issue is, what is the best way to deal w/ this?

Seems there is certain love for -fshort-wchar and these system API's break when
you are compiling w/ it.

A simple solution is, don't use -fshort-wchar 
But this is mozilla, and simple soultions aren't allowed.  ;-)

So i guess it comes down to what is more important. Breaking xpcom uconv
dependency for unix or the perf benifit of compiling w/ this flag on the unix
systems that actually support it.

--pete



pete: right, but wouldn't you expect wcstombs and mbstowcs to treat wchar_t
similarly?  that's what is so odd about this.

at any rate, once we figure out what to look for we should be able to come up
with some autoconf tests that'll tell us if we need to convert to UCS4 before
calling wcstombs, etc.
Another simple solution would be to have configure figure out the size of the
type that wcstombs takes (perhaps just by defining SYS_WCHAR_T to be |wchar_t|,
unless we use -fshort-wchar, in which case you could make it |unsigned int|). 
(Note that on some Unix-ish platforms, e.g., AIX, wchar_t is 2-byte by default
-- see bug 54564 comment 32 and bug 54564 comment 34.)
ok, i give up... i can't reproduce my results from yesterday.  this is so wierd!
Yea, i'm working on dbarons idea. Trouble is, wcstombs says it wants 4 bytes but
when i give it a 4 byte unsigned int, it still fails when using the flag.

--pete
Ok, can't get wcstombs to work using -fshort-wchar no matter what . . 

However, wctomb works "fine and dandy" w/ or w/out the flag.

So i will change the current implementation to use this method instead.

Then, we'll see what else breaks . . .

** One time, 1978. August. For about an hour. I was both fine and dandy at the
same time. But nobody asked me how I was. I coulda told 'em, "Fine and dandy!" I
consider it a lost opportunity. **  -- george carlin


--pete
so, my patch for bug 129279, requires that this bug be fixed.  so, i hacked
together some conversion routines that work under glibc2.2.4... there's some
work that would have to be done to make this portable.	namely:

1- i'm using the reentrant versions of wctomb/mbtowc which probably aren't
universally available.	if not available, then we need to protect the
conversions functions with a mutex.

2- i'm assuming wchar_t is sized correctly for use with wcrtomb/mbrtowc, which
is not true if compiled using -fshort-wchar.  we'd need some autoconf lovin to
figure out what the correct data type is.

3- i'm making the assumption that file paths in native encoding won't exceed
PATH_MAX bytes.  is this valid?  if so, it allows us to avoid a heap
allocation.
actually, if my assumptions about PATH_MAX are correct, then the first part of
convert_ucs2_to_native can go away.  there's no need to precalculate the length
of the native string.
Attached patch nsLocalFileOS2 fixes (obsolete) — Splinter Review
This is an OS/2 version.

It's essentially the Unix version with new versions of UCS2toFS and FStoUCS2.

I also had to hook startup and shutdown of nsLocalFile so I only have to create
the converters once.
Another good one.

int
main ()
{

  char *aBuffer = "test";
  wchar_t wpath[5];
  mbstowcs(wpath, aBuffer, strlen(aBuffer) + 1);

  printf("wpath[0] %c\n", wpath[0]);
  printf("wpath[1] %c\n", wpath[1]);
  printf("wpath[2] %c\n", wpath[2]);
  printf("wpath[3] %c\n", wpath[3]);
  
  return 0;
}


# with -fshort-wchar

$ ./a.out 
wpath[0] t
wpath[1] 
wpath[2] e
wpath[3] 

# w/out -fshort-wchar

$ ./a.out 
wpath[0] t
wpath[1] e
wpath[2] s
wpath[3] t
Comment on attachment 77533 [details] [diff] [review]
nsLocalFileOS2 fixes

is there any extra error recovery you can do if you find that the worst-case
memory allocation isn't enough? 
I figure *8 should give you PLEANTY but I figured I'd check.
And actually, when going from native->UCS2, it seems like the number of UCS2
characters should always be < the number of native characters, no? so the *4
isn't really necessary? Or maybe I'm overlooking something :)

sr=alecf with or without the *4
Attachment #77533 - Flags: superreview+
Attached patch works on linux and freebsd (obsolete) — Splinter Review
This patch actually works on linux and FreeBSD.  -f-short-wchar or not

I don't have any time to clean it up. 

Darin or Alec, please feel free to scrub it some, otherwise it will have to
wait until some time this weekend.

--pete
Attached patch tested and cleaned up (obsolete) — Splinter Review
I'll take some review on this patch

--pete
Attached patch cleaned up a few more things (obsolete) — Splinter Review
Attachment #70803 - Attachment is obsolete: true
Attachment #77494 - Attachment is obsolete: true
Attachment #77649 - Attachment is obsolete: true
Attachment #77774 - Attachment is obsolete: true
This should be solid enough for a review

--pete
Attachment #77778 - Attachment is obsolete: true
Hmm. I made a comment that never showed.

My *8 is overkill, should be times 4.

The new GB codepage can have 4 bytes representing a native character, so worst 
case UCS2 to Native is 4 bytes per one unicode char.

My understanding is that there are some UCS2 chars that are 4 bytes, so native 
to UCS2 would be a 4 byte worst case.

I can add some error checking in there though.
pete: your patch is not thread safe.  we should either use the reentrant forms
of these functions (e.g., wcrtomb and mbrtowc) or protect the wctomb/mbtowc
calls with a PRLock.  also, there's no need to precalculate the length of the
resulting string when converting from wide to narrow.  you can just push a
PATH_MAX sized buffer on the stack, fill the stack buffer with repeated calls to
wcr?tomb and then finish off with a call to strdup.
 I was under the impression that none of the system calls are . . .
What good is wrapping these calls w/ PRLock and leaving stat and the other calls
alone?

NS_IMPL_THREADSAFE_ISUPPORTS2

What does this do? 

--pete
System calls (calls that request services from the kernel) are all threadsafe,
everywhere I've ever seen threads.  Don't worry yourself about system calls.

wctomb and mbtowc are library calls, not system calls, and are indicated in my
man page as not being multi-thread safe.  Darin: using a PRLock won't be enough,
I suspect: what if we're racing with other uses of wctomb in embedder/component
code, rather than just in this local file code?  We need wcrtomb/mbrtowc, likely.

NS_IMPL_THREADSAFE_ISUPPORTS* provide threadsafe implementations of the basic
nsISupports methods (AddRef and Release; QI doesn't have any state) through use
of the NSPR atomic increment/decrement primitives.
shaver: yeah, good point... a lock wouldn't help us :(  do all platforms (that
we care about) provide the reentrant forms of these functions?  i was afraid
that some might not.  others, might use thread local storage to solve the
problem.  that's what a number of UNIX platforms do to make gethostbyname (for
example) reentrant.  wtc?
What about an autoconf test?

HAVE_REENTRANT_WC_API 

or something . . .

Systems that have the reentrant conversion api's use them. Systems that don't,
don't use them.

Or would Pthreads work here?

--pete
In the case where we don't have a reentrant wctomb/mbtowc, we probably need to
use our own with lock-wrappers _everywhere_ in Mozilla, and warn
embedders/extenders that they should use it as well.  (glue library?)

I don't see how pthreads help us -- what do you have in mind?
FWIW: i did a little survey of tinderbox-ports: Sun and Linux boxes provide
wcrtomb/mbrtowc, whereas IRIX and HPUX boxes don't.  i didn't check AIX.
Darin,

I am not familiar with wctomb/mbtowc, so I don't know if the
platforms we care about provide reentrant forms of these
functions.  Sorry.
Let me try and understand the situation better.

We wrap these libc conversion calls w/ pthread_create and things are cool as long
as the embedder doesn't make a seperate raw call to one of these API's in our
process space.

Is this correct?

--pete
Uh.

When you say "wrap these calls with pthread_create", do you mean "explicitly
create a thread for each call to wctomb, and then somehow synchronize with that
thread later, so that we know the conversion is done"?  I can't understand how
that would help, but I can't come up with an alternate interpretation for that.

Beside the fact that we don't want to use raw pthread-API calls if we can avoid
them, _adding_ threads to the system seems a very odd way to deal with a lack of
threadsafety.

What am I missing?
no, not pthread_create.  we want to protect the calls to wctomb/mbtowc with a
PRLock.  see PR_NewLock, PR_DestroyLock, nsAutoLock.  then provided no one calls
wctomb/mbtowc outside of these locks, we're ok.  to do it right, we should
really move this conversion stuff to some service or extern libxpcom methods, so
other components requiring wctomb/mbtowc would be able to use the methods
safely. NSPR!
Agree, what do wide char conversion routines have to do w/ nsLocalFile class?
Sounds like a plan.  Someone will have to facilitate this. 
I provided a patch, tested and working. I don't have the time to get sucked in
any deeper right now. 
I have two boxes here running smoothly w/out xpcom uconv dependency.

"Pretty please w/ sugar on top,  . . . "

--pete
What do you want to do with this guys? 

Mark 1.01?

I assume the win and mac patches that already landed are "thread safe" right?

--pete
Should I try to get approval and get the OS/2 fixes in now or does it matter?
this needs to go in for mozilla 1.0... bug 129279 depends on this bug.
Depends on: 129279
Well, someone needs to advise here. What is it going to take to land this sucker
in the near term?

--pete
i can take care of landing the XP_UNIX part when i land my changes for bug
129279. that won't be a problem.  and then for OS/2, it's really up to whatever
mkaply wants to do.
My change can land any time. I just need a reviewer.
Attached patch OS/2 only patch (obsolete) — Splinter Review
OK, this should be the final OS/2 fix.

I forgot that on OS/2, filenames are limited to CCHMAXPATH(260) anyway, so why
use really big buffers, use CCHMAXPATH.

I also added asserts just in case things are too big.
Attachment #77533 - Attachment is obsolete: true
Comment on attachment 79452 [details] [diff] [review]
OS/2 only patch

r=pedemont
Attachment #79452 - Flags: review+
Comment on attachment 79452 [details] [diff] [review]
OS/2 only patch

doh! Just noticed this:

>+NS_IMETHODIMP
>+nsLocalFile::AppendRelativeUnicodePath(const PRUnichar *node)
>+{
>+    NS_ENSURE_ARG(node);
>+
>+    if (nsCRT::strlen(node) == 0)
>+        return NS_OK;
>+
>+    nsXPIDLCString tmp;
>+
>+    if (tmp.IsEmpty())
>+        return NS_OK;
>+


and I was about to sr= too! :)
Attachment #79452 - Flags: needs-work+
Attached patch New patchSplinter Review
I have no idea where that came from.

The only OS/2 specific stuff in here is the UCStoFS and FStoUCS stuff.

The rest of the APIs I block copied from nsLocalFileUnix.
Attachment #79452 - Attachment is obsolete: true
Comment on attachment 79764 [details] [diff] [review]
New patch

excellent. Then sr=alecf
Attachment #79764 - Flags: superreview+
Comment on attachment 79764 [details] [diff] [review]
New patch

a=shaver for 1.0 branch.
Attachment #79764 - Flags: approval+
only thing left was the removal of "uconv" from the Makefile.in - anyone want
to review? this is the last nasty dependency from xpcom! Thanks darin!
Comment on attachment 81718 [details] [diff] [review]
remove "uconv" from Makefile.in

r=mcafee, make sure this builds & stuff
Attachment #81718 - Flags: review+
Comment on attachment 81718 [details] [diff] [review]
remove "uconv" from Makefile.in

sr=darin
Attachment #81718 - Flags: superreview+
yay! marking this fixed.. thanks to:
darin and pete for unix work
conrad for mac work
mkaply for os/2 work
me for windows work :)

Status: ASSIGNED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → FIXED
adding fixed1.0.0 keyword (branch resolution). This bug has comments saying it
was fixed on the 1.0 branch and a bonsai checkin comment that agrees. To verify
the bug has been fixed on the 1.0 branch please replace the fixed1.0.0 keyword
with verified1.0.0.
Keywords: fixed1.0.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: