Closed Bug 100676 Opened 23 years ago Closed 22 years ago

xpcom depends on uconv for platform charset info

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.0

People

(Reporter: alecf, Assigned: alecf)

References

Details

(Keywords: embed, topembed-, Whiteboard: fix in hand, need reviews)

Attachments

(8 files, 27 obsolete files)

remove consumers of the nsFileSpec operators that use unicode 23 years ago Alec Flett 12.32 KB, patch	Bienvenu : review+ sspitzer : superreview+	Details \| Diff \| Splinter Review
updated patch to reflect tree changes 23 years ago Alec Flett 11.60 KB, patch		Details \| Diff \| Splinter Review
fix 2 more usage, and remove unicode operators completely 23 years ago Alec Flett 7.45 KB, patch	ccarlen : review+ waterson : superreview+	Details \| Diff \| Splinter Review
Patch so nsLocalFileMac doesn't use uconv for conversion 23 years ago Conrad Carlen (not reading bugmail) 7.93 KB, patch	alecf : superreview+	Details \| Diff \| Splinter Review
Better patch for nsLocalFileMac.cpp 23 years ago Conrad Carlen (not reading bugmail) 34.96 KB, patch	sfraser_bugs : review+ alecf : superreview+	Details \| Diff \| Splinter Review
patch for nsLocalFileWin.cpp 23 years ago Alec Flett 5.08 KB, patch		Details \| Diff \| Splinter Review
nsLocalFileUnix patch 23 years ago Pete Collins (MDG) 6.97 KB, patch		Details \| Diff \| Splinter Review
windows fix using Win32 API 23 years ago Alec Flett 5.37 KB, patch		Details \| Diff \| Splinter Review
nsLocalFileWin.cpp v3, with null checks for CopyTo*/MoveTo 23 years ago Alec Flett 5.57 KB, patch	ccarlen : review+	Details \| Diff \| Splinter Review
unix patch w/ error checking doh! 23 years ago Pete Collins (MDG) 7.40 KB, patch		Details \| Diff \| Splinter Review
nsLocalFileWin.cpp v4, with better error checking 23 years ago Alec Flett 5.83 KB, patch	dveditz : superreview+	Details \| Diff \| Splinter Review
revised patch for Brendan to look at 23 years ago Pete Collins (MDG) 16.80 KB, patch		Details \| Diff \| Splinter Review
This would be the correct patch 23 years ago Pete Collins (MDG) 15.50 KB, patch		Details \| Diff \| Splinter Review
only build nsLocalFileUnicode on Unix and OS/2 23 years ago Alec Flett 1.21 KB, patch		Details \| Diff \| Splinter Review
Full patch for unix - please review 23 years ago Pete Collins (MDG) 15.99 KB, patch		Details \| Diff \| Splinter Review
diff -ubw so alec can read it 23 years ago Pete Collins (MDG) 9.23 KB, patch		Details \| Diff \| Splinter Review
used AC_LANG_CPLUSPLUS & AC_LANG_C 23 years ago Pete Collins (MDG) 9.19 KB, patch		Details \| Diff \| Splinter Review
changing bad variable name use 'aBufSiz' to 'bufLen' 23 years ago Pete Collins (MDG) 9.19 KB, patch		Details \| Diff \| Splinter Review
using wctomb to allocate str 23 years ago Pete Collins (MDG) 9.25 KB, patch		Details \| Diff \| Splinter Review
adding nbytes test for failure 23 years ago Pete Collins (MDG) 9.25 KB, patch		Details \| Diff \| Splinter Review
correct patch 23 years ago Pete Collins (MDG) 9.38 KB, patch		Details \| Diff \| Splinter Review
adding checks for empty string params 23 years ago Pete Collins (MDG) 9.91 KB, patch		Details \| Diff \| Splinter Review
we should be getting close 23 years ago Pete Collins (MDG) 10.09 KB, patch		Details \| Diff \| Splinter Review
a little bit closer . . . 23 years ago Pete Collins (MDG) 10.63 KB, patch		Details \| Diff \| Splinter Review
removed final size_t < 0 comparison 23 years ago Pete Collins (MDG) 10.66 KB, patch	alecf : review+	Details \| Diff \| Splinter Review
updated patch w/ Brendan's suggested changes 23 years ago Pete Collins (MDG) 10.59 KB, patch	alecf : review+	Details \| Diff \| Splinter Review
conversion routines using wcrtomb/mbrtowc (not a complete solution) 22 years ago Darin Fisher 2.41 KB, text/plain		Details
nsLocalFileOS2 fixes 22 years ago Mike Kaply [:mkaply] 17.15 KB, patch	alecf : superreview+	Details \| Diff \| Splinter Review
works on linux and freebsd 22 years ago Pete Collins (MDG) 10.38 KB, patch		Details \| Diff \| Splinter Review
tested and cleaned up 22 years ago Pete Collins (MDG) 10.44 KB, patch		Details \| Diff \| Splinter Review
cleaned up a few more things 22 years ago Pete Collins (MDG) 10.38 KB, patch		Details \| Diff \| Splinter Review
forgot the most important part of the patch, to remove 'uconv' dependency from Makefile 22 years ago Pete Collins (MDG) 10.54 KB, patch		Details \| Diff \| Splinter Review
OS/2 only patch 22 years ago Mike Kaply [:mkaply] 8.30 KB, patch	jhpedemonte : review+	Details \| Diff \| Splinter Review
New patch 22 years ago Mike Kaply [:mkaply] 7.54 KB, patch	alecf : superreview+ shaver : approval+	Details \| Diff \| Splinter Review
remove "uconv" from Makefile.in 22 years ago Alec Flett 467 bytes, patch	mcafee : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review

Alec Flett

Assignee

Description

•

23 years ago

yuck.

Alec Flett

Assignee

Comment 1

•

23 years ago

upon investigation, these are the only files using this stuff:
nsFileSpec.cpp: for nsIPlatformCharset.h
nsLocalFileCommon.cpp: for nsIPlatformCharset.h
nsUnicharInputStream.cpp: for nsICharsetConverterManager.h

Blocks: 66759

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla0.9.6

Conrad Carlen (not reading bugmail)

Comment 2

•

23 years ago

As far as the dependencies between nsFileSpec and nsLocalFileCommon and uconv,
the way around this is to avoid the conversion altogether - use Unicode FS APIs
where supported (WinNT/2000, Mac OS9+), and have nsIFile be Unicode-only (bug
94323). For file systems which are not natively Unicode, FS <-> Unicode name
mapping can't be avoided but could be done with OS routines - Pete is working on
that for Unix and I've done it already for the Mac.

Alec Flett

Assignee

Updated

•

23 years ago

Depends on: 94323

Alec Flett

Assignee

Updated

•

23 years ago

Priority: -- → P2

Alec Flett

Assignee

Updated

•

23 years ago

Depends on: 103384

Alec Flett

Assignee

Comment 3

•

23 years ago

yay, people are working on this for bug 94923..

*** This bug has been marked as a duplicate of 94923 ***

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → DUPLICATE

Matthias Versen [:Matti]

Comment 4

•

23 years ago

alecf@netscape.com: 
I don't think your dupe is correct !
Do you mean bug 94323 ?

Conrad Carlen (not reading bugmail)

Comment 5

•

23 years ago

I don't think this is a dup of bug 94323 but instead should block it. Changing
to Unicode-only APIs on nsIFile will put demands on the implementation of
nsLocalFile for each platform. Using uconv to do the conversion has problems and
we'll be forcing much more code through that point. I think what should happen
first is that we get Unicode file handling nailed down on each platform first,
then change the API.

Alec Flett

Assignee

Comment 6

•

23 years ago

talked with conrad.. reopening to keep this problems in small and managable
chunks...

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Alec Flett

Assignee

Updated

•

23 years ago

Status: REOPENED → ASSIGNED

Alec Flett

Assignee

Updated

•

23 years ago

Target Milestone: mozilla0.9.6 → mozilla0.9.7

Alec Flett

Assignee

Comment 7

•

23 years ago

no way I'll hit this for 0.9.6

Alec Flett

Assignee

Updated

•

23 years ago

Depends on: 110371

Alec Flett

Assignee

Updated

•

23 years ago

Depends on: 110531

Alec Flett

Assignee

Comment 8

•

23 years ago

Attached patch remove consumers of the nsFileSpec operators that use unicode (obsolete) — Details — Splinter Review

nsFileSpec is so lame. I found more stuff in nsLocalFileUnicode (only found it
thanks to conrad!)

Anyway, I removed the operators, and fixed all the consumers which were using
them. 
since its mostly mailnews, I'll try to get sspitzer to review.

Alec Flett

Assignee

Comment 9

•

23 years ago

Comment on attachment 59919 [details] [diff] [review]
remove consumers of the nsFileSpec operators that use unicode

ugh, I just realized there's a bunch of stuff from bug 110531 in there. I'll
try to land that first.

Alec Flett

Assignee

Comment 10

•

23 years ago

also, in that patch you'll note that in almost every case of using nsString, we
actually had an ASCII version of the filename that we were blindly converting to
unicode, and then back to a char*.. so no data is lost here.

jag (Peter Annema)

Comment 11

•

23 years ago

Comment on attachment 59919 [details] [diff] [review]
remove consumers of the nsFileSpec operators that use unicode

>Index: profile/pref-migrator/src/nsPrefMigration.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/profile/pref-migrator/src/nsPrefMigration.cpp,v
>retrieving revision 1.157
>diff -u -r1.157 nsPrefMigration.cpp
>--- nsPrefMigration.cpp	2001/11/07 06:23:53	1.157
>+++ nsPrefMigration.cpp	2001/11/30 22:20:29
>@@ -1353,7 +1353,7 @@
> }
> 
> static PRBool
>-nsStringStartsWith(nsString& name, const char *starting)
>+nsCStringStartsWith(nsString& name, const char *starting)

Shouldn't that be nsCString& then? Is this code used at all?

I'll continue when you're done with bug 110531, just wanted to comment on the
above.

Alec Flett

Assignee

Comment 12

•

23 years ago

you're right! that function is only used inside #ifdef
NEED_TO_COPY_AND_RENAME_NEWSRC_FILES - so I wrapped that function with it as
well, and updated it to nsCString

David :Bienvenu

Comment 13

•

23 years ago

Comment on attachment 59919 [details] [diff] [review]
remove consumers of the nsFileSpec operators that use unicode

r=bienvenu, on the mail/news changes.

Attachment #59919 - Flags: review+

(not reading, please use seth@sspitzer.org instead)

Comment 14

•

23 years ago

Comment on attachment 59919 [details] [diff] [review]
remove consumers of the nsFileSpec operators that use unicode

sr=sspitzer on the mailnews stuff, and the profile stuff.

we should definitely give a heads up to ji@netscape.com and
nhotta@netscape.com, so that nhotta can review and ji can check on a non
US-ASCII machine.

Attachment #59919 - Flags: superreview+

Alec Flett

Assignee

Comment 15

•

23 years ago

Attached patch updated patch to reflect tree changes — Details — Splinter Review

This is the same patch, but the tree changed slightly, so this one should apply
more cleanly

Attachment #59919 - Attachment is obsolete: true

Alec Flett

Assignee

Comment 16

•

23 years ago

adding nhotta and ji - is there any chance you guys can test this patch on a
japanese (or other non-european) system? The things we need to make sure work:
1) migrating from a 4.x profile, just make sure that messages in local mail
folders aren't lost
2) Compacting folders with non-Latin1 names doesn't erase the folder
(File->Compact Folders) on both IMAP and POP
3) Creating a new folder with a non-Latin1 name continues to work, on both IMAP
and POP

Thanks guys!

nhottanscp

Comment 17

•

23 years ago

About the patch, how is that related to 'platform charset'?

Alec Flett

Assignee

Comment 18

•

23 years ago

well, nsFileSpec has an API that takes nsStrings, and the only way that API is
implemented is by using the platform (or system?) charset and then call the
appropriate char* method. We're trying to remove calls to uconv from xpcom, and
instead place that burden on the consumers.

nhottanscp

Comment 19

•

23 years ago

> We're trying to remove calls to uconv from xpcom, and
> instead place that burden on the consumers.
I see, like this. Looks fine for me.

+    nsXPIDLCString nativeFolderName;
+    ConvertFromUnicode(nsMsgI18NFileSystemCharset(), nsAutoString(folderName),
+                       getter_Copies(nativeFolderName));
+    
+
path += nativeFolderName.get();

Alec Flett

Assignee

Comment 20

•

23 years ago

well, I still need to remove the old nsFileSpec APIs, so until I get to that,
this bug ain't going nowhere. moving to 0.9.8 :(
I know, I'm as disappointed as you are.

Target Milestone: mozilla0.9.7 → mozilla0.9.8

Alec Flett

Assignee

Comment 21

•

23 years ago

Attached patch fix 2 more usage, and remove unicode operators completely — Details — Splinter Review

Since the last patch was checked in, we had two more consumers of these
routines added. this patch fixes both consumers, and removes the evil operators
and methods all together.
can I get a r=conrad and sr=jag?

Conrad Carlen (not reading bugmail)

Updated

•

23 years ago

Attachment #64162 - Flags: review+

Conrad Carlen (not reading bugmail)

Comment 22

•

23 years ago

Comment on attachment 64162 [details] [diff] [review]
fix 2 more usage, and remove unicode operators completely

Glad to see them go. r=ccarlen

Conrad Carlen (not reading bugmail)

Comment 23

•

23 years ago

Though the r= stands, I can never miss an opportunity to encourage killing off
an nsFileSpec usage ;-)

This one:

 nsSpecialSystemDirectory
dtdPath(nsSpecialSystemDirectory::OS_CurrentProcessDirectory);
-        dtdPath += NS_ConvertASCIItoUCS2(nsDependentCString(kDTDDirectory) +
fileName);

begs for replacement by directory service and nsILocalFile. Then, you could just
use nsIFile::Append(const char*)

Chris Waterson

Comment 24

•

23 years ago

Comment on attachment 64162 [details] [diff] [review]
fix 2 more usage, and remove unicode operators completely

sr=waterson

Attachment #64162 - Flags: superreview+

Alec Flett

Assignee

Comment 25

•

23 years ago

conrad - I'll get to that at some point I swear :) I just keep cleaning up the
tree to get rid of these consumers, and people keep adding them back...in any
case the above patch has been checked in. thanks guys.

Alec Flett

Assignee

Comment 26

•

23 years ago

*sigh* 0.9.8 was way to short. moving all my bugs to 0.9.9

Alec Flett

Assignee

Updated

•

23 years ago

Target Milestone: mozilla0.9.8 → mozilla0.9.9

Conrad Carlen (not reading bugmail)

Comment 27

•

23 years ago

Attached patch Patch so nsLocalFileMac doesn't use uconv for conversion (obsolete) — Details — Splinter Review

This patch uses nsFSStringConversionMac which I put into nsLocalFileMac a while
back. This object uses native conversion facilities so uconv is not needed at
all. Notice nsLocalFileUnicode is gone from the project :-) Aside from no uconv
dep, this won't really pay until bug 95481 is done. With that, there will be no
conversion of any kind, except on pre-OS9 systems.

Alec Flett

Assignee

Comment 28

•

23 years ago

Comment on attachment 65780 [details] [diff] [review]
Patch so nsLocalFileMac doesn't use uconv for conversion

Awesome! Let's get a mac person to review this, but the concept looks great to
me.

sr=alecf

Attachment #65780 - Flags: superreview+

Simon Fraser [no longer active]

Comment 29

•

23 years ago

Regarding attachment 65780 [details] [diff] [review]:
Hrmm, ugly macro foo. The macros hide return logic etc, which is unfortunate. 
Isn't there a better way to do this?

OS: Windows 2000 → All

Hardware: PC → All

Conrad Carlen (not reading bugmail)

Comment 30

•

23 years ago

> Isn't there a better way to do this?
Yeah - expand the macros ;-) I based this on the code here:
http://lxr.mozilla.org/mozilla/source/xpcom/io/nsLocalFileUnicode.cpp#90, but
cut down the number of macros. I'll just get rid of them in a new patch.

Conrad Carlen (not reading bugmail)

Comment 31

•

23 years ago

Attached patch Better patch for nsLocalFileMac.cpp — Details — Splinter Review

Patch does two things:
(1) Gets rid of macros
(2) GetUnicodePath(), if HFSPlus is available, doesn't just call GetPath() and
convert that to Unicode. With HFSPlus, the extant portion of the path is gotten
directly in Unicode in the first place. So, I made a static utility function
HFSPlusGetRawPath() which is used by both versions of GetPath(). That involved
moving (but not changing) some code. The unicode conversion class had to be
moved up in the file. I moved the path parser object along with it.
I'm describing this because diff made a mess out of the diff. It's hard to tell
what happened. The old equivalent of diff in MPW had a great feature where you
could specify the minimum number of matching lines before it considered the two
to be "in sync" I guess diff doesn't have that and a simple matching brace
somewhere will cause it to end the non-matching run.

Attachment #65780 - Attachment is obsolete: true

Conrad Carlen (not reading bugmail)

Comment 32

•

23 years ago

Naoki, can you apply the patch in attachment 66286 [details] [diff] [review] and try it out on a Japanese
system?

nhottanscp

Comment 33

•

23 years ago

I applied the patch to Classic build and ran on MacOS 9.1 JA.
I tested creating JA profile name, saved .html in JA file name, opened .html in
JA file name, I did not see problems.

Conrad Carlen (not reading bugmail)

Comment 34

•

23 years ago

Simon, based on Naoki's testing on OS 9.1 JA, I'd say it's good. Can you r=? The
only difference between 9.1 and OSX is, on OSX, the direct-from-unicode path is
used. But, that code has been in place in GetPath() for a while now so I'm
pretty sure of it.

Alec Flett

Assignee

Comment 35

•

23 years ago

Attached patch patch for nsLocalFileWin.cpp (obsolete) — Details — Splinter Review

working off of conrad's example, I've done the windows work using the windows
system API's |wcstombs| and |mbstowcs|

Looking for reviewers...

Alec Flett

Assignee

Comment 36

•

23 years ago

nhotta, would you mind trying the windows patch, attachment 66916 [details] [diff] [review], for me, like
you did for conrad on the mac? thanks.

pete, do you have similar changes for unix?

Pete Collins (MDG)

Comment 37

•

23 years ago

Alec, yea I did this a while ago. I need to break it out of bug 94323 and
provide a patch here.

I should be able to get to this tonight.

--pete

nhottanscp

Comment 38

•

23 years ago

Mozilla usually use WinAPI WideCharToMultiByte and MultiByteToWideChar instead of 
wcstombs and mbstowcs. Those WinAPI might have more coverage for charsets than
the runtime library functions, also that's more consistant with other parts of
the code in Mozilla.
I'll try the patch by setting the default system locale to JA on my Windows 2000.

Pete Collins (MDG)

Comment 39

•

23 years ago

Attached patch nsLocalFileUnix patch (obsolete) — Details — Splinter Review

Pete Collins (MDG)

Comment 40

•

23 years ago

I would need a Japanese machine to test out the unix code as well. 

After applying the patch, you need to run autoconf, reconfigure and compile.

Thanks

--pete

Alec Flett

Assignee

Comment 41

•

23 years ago

Attached patch windows fix using Win32 API (obsolete) — Details — Splinter Review

Naoki - how does this look? I switched to the Win32 API (i.e.
WideCharToMultiByte and MultiByteToWideChar)

Attachment #66962 - Attachment is obsolete: true

Alec Flett

Assignee

Comment 42

•

23 years ago

Comment on attachment 66962 [details] [diff] [review]
nsLocalFileUnix patch

oops.. sorry I obsoleted pete's patch. ignore that.

Attachment #66962 - Attachment is obsolete: false

Alec Flett

Assignee

Updated

•

23 years ago

Attachment #66916 - Attachment is obsolete: true

Alec Flett

Assignee

Comment 43

•

23 years ago

Comment on attachment 66286 [details] [diff] [review]
Better patch for nsLocalFileMac.cpp

sr=alecf on the mac patch btw, assuming a mac person reviews it.

Attachment #66286 - Flags: superreview+

Alec Flett

Assignee

Comment 44

•

23 years ago

adding dveditz for super review on attachment 66977 [details] [diff] [review]
ccarlen or nhotta, do one of you mind reviewing it?

Conrad Carlen (not reading bugmail)

Comment 45

•

23 years ago

Comment on attachment 66977 [details] [diff] [review]
windows fix using Win32 API

Both CopyTo* and MoveTo* can take null as the name param. Check for that first.

Pete Collins (MDG)

Comment 46

•

23 years ago

cc'ing brendan to see if he can look at the unix patch.

--pete

Alec Flett

Assignee

Comment 47

•

23 years ago

Attached patch nsLocalFileWin.cpp v3, with null checks for CopyTo*/MoveTo (obsolete) — Details — Splinter Review

Ok, 3rd time's the charm :)

Attachment #66977 - Attachment is obsolete: true

nhottanscp

Comment 48

•

23 years ago

wcstombs and mbstowcs not working fine on my environment (Windows2000 default
charset set to JA). mbstowcs seems to infrate a byte to two bytes by casting and
wcstombs does the other way. They might need localized systems to work properly.
I will try the latest patch.

nhottanscp

Comment 49

•

23 years ago

With the latest patch (id=66987), I was able to create and delete a profile in
JA name, also able to save a file in JA file name, tested on Windows 2000
default system set to JA.

Alec Flett

Assignee

Updated

•

23 years ago

Whiteboard: fix in hand, need reviews

Pete Collins (MDG)

Comment 50

•

23 years ago

Attached patch unix patch w/ error checking doh! (obsolete) — Details — Splinter Review

Pete Collins (MDG)

Updated

•

23 years ago

Attachment #66962 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 51

•

23 years ago

Comment on attachment 66962 [details] [diff] [review]
nsLocalFileUnix patch

>Index: configure.in

cls@seawood.org should review the configure.in change.

>Index: xpcom/io/nsLocalFileUnix.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/xpcom/io/nsLocalFileUnix.cpp,v
>retrieving revision 1.83
>diff -u -r1.83 nsLocalFileUnix.cpp
>--- xpcom/io/nsLocalFileUnix.cpp	2002/01/28 07:02:02	1.83
>+++ xpcom/io/nsLocalFileUnix.cpp	2002/01/29 22:26:19
>@@ -1536,3 +1536,168 @@
>     *result = file;
>     return NS_OK;
> }
>+
>+static nsresult 
>+UCS2toFS(const PRUnichar *aBuffer, char **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+    char str[BUFSIZ];                     
>+    sys_wchar_t wpath[BUFSIZ];           
>+    size_t len = 0;                     
>+    while (aBuffer[len] != L'\0') {    
>+        wpath[len] = aBuffer[len];          
>+        ++len;                         
>+    }

Argh, potential buffer overrun here, plus a big stack buffer (BUFSIZ is good
for stdio, who says it's right for this use?).	You should call nsCRT::strlen
on aBuffer, and use a sys_wchar_t smallbuf[64] if possible, else malloc.

Nit: lots of random trailing spaces on the above lines -- :%s/	*$// please!

>+    wpath[len] = L'\0';             
>+    if (wcstombs(str, wpath, BUFSIZ) < 0 )   
>+        return NS_ERROR_FAILURE;              
>+
>+    *aResult = nsCRT::strdup(str);

Is there no way to tell how much space the mbs version of the wcs will be?  Can
you allocate str dynamically to worst-case size, then realloc downward?  You're
going to have to allocate dynamically anyway.
>+static nsresult 
>+FStoUCS2(const char* aBuffer, PRUnichar **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+    sys_wchar_t wpath[BUFSIZ];
>+    if (mbstowcs(wpath, aBuffer, BUFSIZ) < 0)           
>+        return NS_ERROR_FAILURE;                           
>+    size_t len = 0;                                     
>+    PRUnichar wbuf[BUFSIZ];

Same comments here.

/be

Attachment #66962 - Attachment is obsolete: false

Attachment #66962 - Flags: needs-work+

Pete Collins (MDG)

Comment 52

•

23 years ago

What about this?

static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);
    size_t bufsiz = 64;
    if ((nsCRT::strlen(aBuffer) / 8) > bufsiz) {
        NS_ERROR("path is to large . . .\n");
        return NS_ERROR_OUT_OF_MEMORY;
    }
    sys_wchar_t *wpath = (sys_wchar_t *)nsMemory::Alloc(bufsiz);
    if (!wpath)
        return NS_ERROR_OUT_OF_MEMORY;
    size_t len = 0;
    while (aBuffer[len] != L'\0') {
        wpath[len] = aBuffer[len];
        ++len;
    }
    wpath[len] = L'\0';
    char *str = (char *)nsMemory::Alloc(bufsiz);
    if (!str)
        return NS_ERROR_OUT_OF_MEMORY;
    if ((len = wcstombs(str, wpath, bufsiz)) < 0 ) {
        nsMemory::Free(wpath);
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = nsCRT::strndup(str, len);
    if (!*aResult) {
        nsMemory::Free(wpath);
        nsMemory::Free(str);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    return NS_OK;
}

Alec Flett

Assignee

Comment 53

•

23 years ago

according to the manpage, the wcstombs and mbstowcs return the number of bytes
required.. take a look at my attachment 66916 [details] [diff] [review] for how I did it over there.

Since we need to allocate space for the result, we might as well get the right
number of bytes - I don't see a need for a temporary buffer.

Brendan Eich [:brendan]

Comment 54

•

23 years ago

alecf, don't we need a copy from PRUnichar array to sys_wchar_t array?  Those
types may not be the same size.  That's why I mentioned a small stack buffer to
avoid malloc'ing all the time -- but now that you mention it, we could autoconf
a macro by which to test whether PRUnichar is smaller than sys_wchar_t.

We could even do a runtime if-else test conditioned by (sizeof(PRUnichar) !=
sizeof(sys_wchar_t)) and let the compiler eliminate dead code (probably it would
warn).

BTW, your cited patch doesn't check for OOM.

Pete, I don't think this is right at all:

    size_t bufsiz = 64;
    if ((nsCRT::strlen(aBuffer) / 8) > bufsiz) {
        NS_ERROR("path is to large . . .\n");
        return NS_ERROR_OUT_OF_MEMORY;
    }

The while loop that follows is really a for loop, too.  I think the L'\0' is not
quite right when testing a PRUnichar, either.  And don't strndup str, it's
already allocated and ready to return to the caller.

Here's my version:

static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);

    sys_wchar_t *wpath, wbuf[64];
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_REINTERPRET_CAST(sys_wchar_t *, aBuffer);
    } else {
        PRUint32 length = nsCRT::strlen(aBuffer);
        size_t size = (length + 1) * sizeof(PRUnichar);
        if (size <= sizeof(wbuf)) {
            wpath = wbuf;
        } else {
            wpath = NS_REINTERPRET_CAS(sys_wchar_t *, nsMemory::Alloc(size));
            if (!wpath)
                return NS_ERROR_OUT_OF_MEMORY;
        }
        for (size_t i = 0; aBuffer[i] != 0; i++)
            wpath[i] = aBuffer[i];
        wpath[i] = L'\0';
    }

    size_t nbytes = wcstombs(nsnull, wpath, 0) + 1;
    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(nbytes));
    if (!str) {
        if (sizeof(PRUnichar) != sizeof(sys_wchar_t) && wpath != wbuf)
            nsMemory::Free(wpath);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    nbytes = wcstombs(str, wpath, nbytes);
    if (sizeof(PRUnichar) != sizeof(sys_wchar_t) && wpath != wbuf)
        nsMemory::Free(wpath);
    if (nbytes == size_t(-1)) {
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = str;
    return NS_OK;
}

/be

Conrad Carlen (not reading bugmail)

Comment 55

•

23 years ago

Comment on attachment 66987 [details] [diff] [review]
nsLocalFileWin.cpp v3, with null checks for CopyTo*/MoveTo

r=carlen
One suggestion though: Break statements like

+    if (!newName) return CopyTo(newParentDir, nsnull);

onto two lines. Just in case somebody might want to set a breakpoint on that
condition.

Attachment #66987 - Flags: review+

Pete Collins (MDG)

Comment 56

•

23 years ago

Brendan, I had to make some changes to get the code to compile, not warn and
then work, respectively.

> size_t nbytes = wcstombs(nsnull, wpath, 0) + 1;

Doesn't fly, you will always get "nbytes = 0".

You will notice i allocated a default buffer size for "str" which you are not
going to like.
Please advise on the best thing to do there.  
Not sure if you will like 'wpathIsAlloc' either . . .


static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);
    NS_ENSURE_ARG_POINTER(aResult);
    sys_wchar_t *wpath, wbuf[64];
    PRBool wpathIsAlloc = PR_FALSE;
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_REINTERPRET_CAST(sys_wchar_t *, (PRUnichar*)aBuffer);
    } else {
        PRUint32 length = nsCRT::strlen(aBuffer);
        size_t size = (length + 1) * sizeof(PRUnichar);
        if (size <= sizeof(wbuf)) {
            wpath = wbuf;
        } else {
            wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(size));
            wpathIsAlloc = PR_TRUE;
            if (!wpath)
                return NS_ERROR_OUT_OF_MEMORY;
        }
        size_t i;
        for (i = 0; aBuffer[i] != 0; i++)
            wpath[i] = aBuffer[i];
        wpath[i] = L'\0';
    }

    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(64));
    if (!str) {
        if (wpathIsAlloc)
            nsMemory::Free(wpath);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    size_t nbytes = wcstombs(str, wpath, 64);
    nbytes = wcstombs(str, wpath, nbytes);
    if (wpathIsAlloc)
        nsMemory::Free(wpath);
    if (nbytes == size_t(-1)) {
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = str;
    return NS_OK;
}

Brendan Eich [:brendan]

Comment 57

•

23 years ago

pete, wcstombs(3) says that if you pass null for the first (dest) param, the
third (n) param is ignored, and there is no limit on the returned byte count,
which can be used to allocate.  So I don't see why you need to pass 64 to the
first call, nor does that seem big enough in general.  The code alecf wrote does
what I sketched, too (attachment 66916 [details] [diff] [review]). 

Thanks for fixing my code, I was hacking in the bugzilla comment input textarea,
always dangerous.  I don't mind the wpathIsAlloc boolean, thought of that myself
but decided the uglier condition was ok to repeat, as the first clause is a
compile-time constant, and the wpath != wbuf is the important bit.  But either
is ok (except maybe call it wpathWasAlloced for grammatical purity of essence?).

/be

Brendan Eich [:brendan]

Comment 58

•

23 years ago

Oops also on the NS_REINTERPRET_CAST(sys_wchar_t *, aBuffer) where aBuffer was
const PRUnichar * -- but pete, you can't fix that with an old-style C cast and
not run the risk of someone passing a string constant in and this code crashing
trying to store into readonly memory.  I made the aBuffer param a pointer to
non-const here.

I also got rid of the NS_ENSURE_ARG_POINTER(aResult), it seems silly especially
in a static function.  The static-ness lets anyone inspect nsLocalFileUnix.cpp
only, in order to find all calls and see that they follow the XPCOM rules.  In
that vein, I thought it was ok to eliminate the const from aBuffer's type -- it
is the only way to avoid a copy when sizeof(sys_wchar_t) == sizeof(PRUnichar).

static nsresult
UCS2toFS(PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);

    sys_wchar_t *wpath, wbuf[64];
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_REINTERPRET_CAST(sys_wchar_t *, aBuffer);
    } else {
        PRUint32 length = nsCRT::strlen(aBuffer);
        size_t size = (length + 1) * sizeof(PRUnichar);
        if (size <= sizeof(wbuf)) {
            wpath = wbuf;
        } else {
            wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(size));
            if (!wpath)
                return NS_ERROR_OUT_OF_MEMORY;
        }
        size_t i;
        for (i = 0; aBuffer[i] != 0; i++)
            wpath[i] = aBuffer[i];
        wpath[i] = L'\0';
    }

    size_t nbytes = wcstombs(nsnull, wpath, 0) + 1;
    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(nbytes));
    if (!str) {
        if (sizeof(PRUnichar) != sizeof(sys_wchar_t) && wpath != wbuf)
            nsMemory::Free(wpath);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    nbytes = wcstombs(str, wpath, nbytes);
    if (sizeof(PRUnichar) != sizeof(sys_wchar_t) && wpath != wbuf)
        nsMemory::Free(wpath);
    if (nbytes == size_t(-1)) {
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = str;
    return NS_OK;
}

/be

Pete Collins (MDG)

Comment 59

•

23 years ago

Brendan, on linux it says that.  ;-)

> If  dest  is  NULL, n is ignored, and the conversion proceeds as above, except
that
> the converted bytes are not written out to memory, and that no length limit
exists.

Not on FreeBSD. Can't tell you about other nX flavors  . . .

--pete

Pete Collins (MDG)

Comment 60

•

23 years ago

Brendan, can you show me how to crash it using (PRUnichar *) cast?
I can't get it to crash with the cast there.  It sucks having the compiler
warning there. 

But do we really need to use NS_REINTERPRET_CAST?

Can't we just do something like this?

    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
         wpath = (sys_wchar_t *) aBuffer;

and just make the copy since they are the same types in this particular case anyway?

--pete

Simon Fraser [no longer active]

Comment 61

•

23 years ago

Comment on attachment 66286 [details] [diff] [review]
Better patch for nsLocalFileMac.cpp

r=sfraser. But what will all this unicode conversion stuff do to our startup
performance?

Attachment #66286 - Flags: review+

Brendan Eich [:brendan]

Comment 62

•

23 years ago

Pete, I'm wrong: you should use NS_CONST_CAST just for C++ purity and leave the
aBuffer parameter type const PRUnichar *:

static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);

    sys_wchar_t *wpath, wbuf[64];
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_CONST_CAST(sys_wchar_t *, aBuffer);
    } else {
 . . .

the NS_ENSURE_ARG could go to, for the reason I gave against
NS_ENSURE_ARG_POINTER(aResult).

/be

Brendan Eich [:brendan]

Comment 63

•

23 years ago

Is FreeBSD's wcstombs really busted?  News to alecf, I bet!

/be

Alec Flett

Assignee

Comment 64

•

23 years ago

simon: we're already paying that price right now, in fact we have the overhead
of going through our own Unicode decoders which are likely to be just slightly
slower because they are designed to handle arbitrary charset decoding/encoding.

Pete Collins (MDG)

Comment 65

•

23 years ago

Yup, wcstombs appears to be busted on FreeBSD.

You can't do this:

len = wcstombs(0, wcs, 0);

len is -1

--pete

Pete Collins (MDG)

Comment 66

•

23 years ago

Here's the FreeBSD bug tracking the issue.

http://www.freebsd.org/cgi/query-pr.cgi?pr=17694

> This is a nonstandard extension.  It is not present in C89 or in at
> least the n869 draft of C99.

Interseting . . .

--pete

Pete Collins (MDG)

Comment 67

•

23 years ago

Brendan, this looks pretty FreeBSD specific, so i can just ifdef it for now.

--pete

Alec Flett

Assignee

Comment 68

•

23 years ago

From the log, it also sounds like ONLY FreeBSD is broken. (there is mention of
HP-UX, AIX, Irix, Solaris, Linux, and Tru64)

Can we have a configure test to see if this function is broken, and then have
some kind of temporary buffer only for the broken platform?

Chad Austin

Comment 69

•

23 years ago

Very weird.  I just looked in the C99 standard, and it says nothing about 
counting the number of characters required if NULL is passed in as the 
destination pointer.

(just confirming Bruce Evans' claim)

Pete Collins (MDG)

Comment 70

•

23 years ago

Hence, 

> MSVC++ 6.0 does it too (might be a hint where it comes from :-( ).

heh

--pete

Pete Collins (MDG)

Comment 71

•

23 years ago

size_t wcstombs(char *s, const wchar_t *pwcs, size_t n);

Converts a sequence of codes that correspond to multibyte characters from the
array pointed to by pwcs into a sequence of multibyte characters that begins in
the initial shift state, and stores these multibyte characters into the array
pointed to by s. The conversion stops if a multibyte character would exceed the
limit of n total bytes or if a null character is stored.
Each code is converted as if by a call to wctomb , except that the shift state
of wctomb is not affected.
If a code is encountered that does not correspond to a valid multibyte
character, the wcstombs function returns (size_t) - 1 . Otherwise, it returns
the number of bytes modified, not including a terminating null character, if any.

Pete Collins (MDG)

Comment 72

•

23 years ago

What about this?

static nsresult
UCS2toFS(const PRUnichar *aBuffer, char **aResult)
{
    NS_ENSURE_ARG(aBuffer);
    sys_wchar_t *wpath, wbuf[64];
    PRBool wpathWasAlloced = PR_FALSE;
    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
        wpath = NS_CONST_CAST(sys_wchar_t *, (sys_wchar_t *)aBuffer);
    } else {
        PRUint32 length = nsCRT::strlen(aBuffer);
        size_t size = (length + 1) * sizeof(PRUnichar);
        if (size <= sizeof(wbuf)) {
            wpath = wbuf;
        } else {
            wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(size));
            wpathWasAlloced = PR_TRUE;
            if (!wpath)
                return NS_ERROR_OUT_OF_MEMORY;
        }
        size_t i;
        for (i = 0; aBuffer[i] != 0; i++)
            wpath[i] = aBuffer[i];
        wpath[i] = L'\0';
    }
#if defined(__FreeBSD__)
    char tmpbuf[64];
    size_t nbytes = wcstombs(tmpbuf, wpath, 64);
#else
    size_t nbytes = wcstombs(nsnull, wpath, 0);
#endif
    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(nbytes));
    if (!str) {
        if (wpathWasAlloced)
            nsMemory::Free(wpath);
        return NS_ERROR_OUT_OF_MEMORY;
    }
    nbytes = wcstombs(str, wpath, nbytes);
    if (wpathWasAlloced)
        nsMemory::Free(wpath);
    if (nbytes == size_t(-1)) {
        nsMemory::Free(str);
        return NS_ERROR_FAILURE;
    }
    *aResult = str;
    return NS_OK;
}

Daniel Veditz [:dveditz]

Comment 73

•

23 years ago

Comment on attachment 66987 [details] [diff] [review]
nsLocalFileWin.cpp v3, with null checks for CopyTo*/MoveTo

>+    // add 1 for null termination
>+    size_t chars = ::WideCharToMultiByte(CP_ACP, 0,
>+                                         aBuffer, -1,
>+                                         NULL, 0, NULL, NULL) + 1;

Since you used -1 to have the routine calculate the length then it'll
include the null terminator.

>+    if (chars == 1)
>+        return NS_ERROR_FAILURE;

Therefore this should be a check for 0

>+    *aResult = (char*)nsMemory::Alloc(chars * sizeof(char));
>+    chars = ::WideCharToMultiByte(CP_ACP, 0,
>+                                  aBuffer, -1,
>+                                  *aResult, chars, NULL, NULL);

Have you tested that this works if the Alloc fails? Some msvcrt routines
(strcpy, etc) crash when given null pointers, although the docs do say
this routine checks that the two buffers aren't the same so maybe it
does handle null OK.

>+    if (chars >= 0)
>+        (*aResult)[chars] = '\0'; // null terminate
>+
>+    return NS_OK;

Since you've used -1 to get the length the null terminator is included
in the conversion. However, if chars *is* 0 you've got a problem so do
you still want to return NS_OK?

>+static nsresult FStoUCS2(const char* aBuffer, PRUnichar **aResult)

Ditto the above for this version.

Alec Flett

Assignee

Comment 74

•

23 years ago

Attached patch nsLocalFileWin.cpp v4, with better error checking — Details — Splinter Review

Ok, 4th time's the charm? I had left in the bogus null-termination logic from
when I was using wcstombs - whoops! Anyway, this patch:
- checks its arguments for null
- checks memory allocation failure
- checks failure of the conversion routines

Reviews?

Attachment #66962 - Attachment is obsolete: true

Attachment #66987 - Attachment is obsolete: true

Daniel Veditz [:dveditz]

Updated

•

23 years ago

Attachment #67303 - Flags: superreview+

Daniel Veditz [:dveditz]

Comment 75

•

23 years ago

Comment on attachment 67303 [details] [diff] [review]
nsLocalFileWin.cpp v4, with better error checking

sr=dveditz

Alec Flett

Assignee

Comment 76

•

23 years ago

thanks guys - windows and mac have been checked in. Now we just need unix and
we're done! I'll try to get to the review today.

Pete Collins (MDG)

Comment 77

•

23 years ago

I can finish the unix patch if the last UCS2toFS method posted is the way to go
here.

--pete

Brendan Eich [:brendan]

Comment 78

•

23 years ago

pete, what if the string at aBuffer has 64 or more characters in it?

#if defined(__FreeBSD__)
    char tmpbuf[64];
    size_t nbytes = wcstombs(tmpbuf, wpath, 64);
#else
 . . .

Won't freebsd versions of Mozilla truncate the string brutally?

/be

Pete Collins (MDG)

Comment 79

•

23 years ago

Brendan, originally i was using BUFSIZ and you said it was to big. 

This is only a temporary buffer so we can allocate the real one after.

I'll go back to using BUFSIZ.

--pete

Brendan Eich [:brendan]

Comment 80

•

23 years ago

Pete, my point is that you are bounding the string conversion by the size of the
stack buffer.  My point against BUFSIZ (it's good for stdio i/o buffering, but
what does that have to do with your intended use?) still stands -- plus, I am
saying that you should find a way to have no arbitrary bound on the string
length, unless we use this only for strings that have some bound already -- such
as PATH_MAX (scaled by sizeof(sys_wchar_t) of course).

Is there such a bound on the strings that come in via aBuffer?  If not, how do
other FreeBSD programs use wcstombs without imposing arbitrary limits?  Do they
use wcstomb instead, iterating through an arbitrary length string using a fixed
size buffer?

/be

Pete Collins (MDG)

Comment 81

•

23 years ago

Attached patch revised patch for Brendan to look at (obsolete) — Details — Splinter Review

Attachment #67000 - Attachment is obsolete: true

Pete Collins (MDG)

Comment 82

•

23 years ago

Attached patch This would be the correct patch (obsolete) — Details — Splinter Review

Attachment #68255 - Attachment is obsolete: true

Pete Collins (MDG)

Comment 83

•

23 years ago

Comment on attachment 68256 [details] [diff] [review]
This would be the correct patch

Retracting patch. I have some more things to clean fix.

Attachment #68256 - Attachment is obsolete: true

Alec Flett

Assignee

Comment 84

•

23 years ago

Attached patch only build nsLocalFileUnicode on Unix and OS/2 — Details — Splinter Review

with the switch to gmake on win32, we re-gained this dependency, and started
building nsLocalFileUnicode on Windows again. This patch makes that and the
REQUIRES=uconv explicit on OS/2 and Unix.

nnooiissee

Updated

•

23 years ago

Blocks: 117359

Pete Collins (MDG)

Comment 85

•

23 years ago

Attached patch Full patch for unix - please review (obsolete) — Details — Splinter Review

Alec Flett

Assignee

Comment 86

•

23 years ago

Comment on attachment 68380 [details] [diff] [review]
Full patch for unix - please review

how about a diff -bw so we can actually see what changed? :)

Pete Collins (MDG)

Comment 87

•

23 years ago

Attached patch diff -ubw so alec can read it (obsolete) — Details — Splinter Review

Pete Collins (MDG)

Comment 88

•

23 years ago

cc'ing chris seawood so he can look over the changes to configure.in

Chris, can you look at the autoconf addition in attachment 68399 [details] [diff] [review].

Thanks

--pete

hacker formerly known as seawood@netscape.com

Comment 89

•

23 years ago

Comment on attachment 68399 [details] [diff] [review]
diff -ubw so alec can read it

Since you're not actually testing additional CFLAGS, you should use
AC_LANG_CPLUSPLUS & AC_LANG_C instead of using _SAVE_CC.  Fix that && r=cls.

Attachment #68399 - Flags: needs-work+

Pete Collins (MDG)

Comment 90

•

23 years ago

Attached patch used AC_LANG_CPLUSPLUS & AC_LANG_C (obsolete) — Details — Splinter Review

Attachment #68380 - Attachment is obsolete: true

Attachment #68399 - Attachment is obsolete: true

Pete Collins (MDG)

Comment 91

•

23 years ago

Attached patch changing bad variable name use 'aBufSiz' to 'bufLen' (obsolete) — Details — Splinter Review

Attachment #68663 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 92

•

23 years ago

Comment on attachment 68712 [details] [diff] [review]
changing bad variable name use 'aBufSiz' to 'bufLen' 

>+    // nbytes is simply the amount of characters in aBuffer + 1
>+    size_t nbytes = nsCRT::strlen(aBuffer)+1;
>+    char *str = NS_REINTERPRET_CAST(char *, nsMemory::Alloc(nbytes));
>+    if (!str) {
>+        if (wpathWasAlloced)
>+            nsMemory::Free(wpath);
>+        return NS_ERROR_OUT_OF_MEMORY;
>+    }
>+    nbytes = wcstombs(str, wpath, nbytes);

But a multibyte string may need 1, 2, or 3 bytes per PRUnichar, so you still
may be underallocating str.  If you can't use the wcstombs extension (where
passing 0 in with a NULL result buffer causes it to return the required
nbytes), then I repeat that you might have to (might as well) use wcstomb
iteratively.

Or maybe you could use the worst case size.  Is that 3 * the number of
characters?  Or could it be more?  We need advice from i18n folks.

/be

Attachment #68712 - Flags: needs-work+

Pete Collins (MDG)

Comment 93

•

23 years ago

> Or maybe you could use the worst case size.  Is that 3 * the number of
characters?  Or could it be more?  We need advice from i18n folks.

I think PRUnichars are always double byte. 

I think we would want to double the amount of chars present not triple it.

size_t nbytes = (nsCRT::strlen(aBuffer)+1) * 2;

So if there are 10 chars present, we want to allocate a 20 byte max size.
Two bytes per char.

This is wrong only if PRUnichars can be more than 2 bytes on some systems. 

--pete

Pete Collins (MDG)

Comment 94

•

23 years ago

Sorry, we are allocating for sys_wchar not PRUnichar.

Ok, same is true just need to allocate for the variable system wchar size.

--pete

Brendan Eich [:brendan]

Comment 95

•

23 years ago

PRUnichar is always uint16, or two bytes.  The problem is that an "mbs"
(multi-byte string) may need three bytes to represent some PRUnichars, but only
two or one for others.  S-JIS is one such charset encoding.

The worst case for a charset encoding that can express up to 65536 characters
has to be more than two bytes for any encoding that uses a variable number of
bytes per character, because there needs to be at least one bit in the first (or
only) byte that distinguishes a one-byte code from a two-byte code -- therefore
not all two-byte codes are available to represent the high-byte-non-zero part of
Unicode.  So three is a minimum multiplier for any charset that can express up
to 65536 characters.

Cc'ing ftang, but also: pete, why not use wcstomb/wcsrtombs (I noticed wcstomb
from your FreeBSD man page; I just found the similar-seeming wcsrtombs on my RH
7.1 Linux system) iteratively?

/be

Pete Collins (MDG)

Comment 96

•

23 years ago

Yea, i'll iterate through the chars and find out how many bytes they are using
wctomb.

I actually had this implemented already and thought it was unneecessary code
because I kept getting the same number as strlen. 

Now i realize, thats because i have a single byte charset i'm using here to test.

I'll post a patch tomorrow.

Thanks

--pete

Pete Collins (MDG)

Comment 97

•

23 years ago

Attached patch using wctomb to allocate str (obsolete) — Details — Splinter Review

Attachment #68712 - Attachment is obsolete: true

Pete Collins (MDG)

Comment 98

•

23 years ago

Attached patch adding nbytes test for failure (obsolete) — Details — Splinter Review

Attachment #68831 - Attachment is obsolete: true

Pete Collins (MDG)

Comment 99

•

23 years ago

> for(size_t i = 0; wpath[i] != L'\0'; i++) 

Brendan, ignore the lack of a space after the for i just fixed it.

for (size_t i = 0; wpath[i] != L'\0'; i++) 

--pete

Pete Collins (MDG)

Comment 100

•

23 years ago

Attached patch correct patch (obsolete) — Details — Splinter Review

Attachment #68858 - Attachment is obsolete: true

Pete Collins (MDG)

Comment 101

•

23 years ago

Perhaps someone can apply this patch on a linux japanese box and we can move
closer to landing this.

Testing various flavors of n'x would be ideal before a check-in.

--pete

Alec Flett

Assignee

Comment 102

•

23 years ago

Comment on attachment 68860 [details] [diff] [review]
correct patch

actually, I think tmpStr is going to have to be even bigger. Linux defines it
as MB_CUR_MAX.. though I can't tell how much that is. UTF8 has up to 6 byte
characters, not sure about other charsets.

the linux manpage also states that "at most n bytes are written" - do we know
that "wctomb" will return 1 if we pass in a null byte? If not, I think we need
to adjust for the terminating null by incrementing nbytes.

I'm also not sure of the value of 
if (nbytes < 0) 
because nbytes is a size_t (which is probably unsigned) - won't this statement
always be false?

Pete Collins (MDG)

Comment 103

•

23 years ago

     If mbchar is NULL, the mblen(), mbtowc() and wctomb() functions return
     nonzero if shift states are supported, zero otherwise.  If mbchar is
     valid, then these functions return the number of bytes processed in
     mbchar, or -1 if no multibyte character could be recognized or converted.


So 

if (nbytes < 0) 

Is valid . . .

MB_CUR_MAX is 4 bytes on freebsd i think it can be either 2 or 4 bytes on linux.

tmpStr needs to be big enough for sys_wchar_t which the max value is 4 byte, 2
byte on some linux systems, so that should be fine as well.

I beleive we are still cool w/ this patch.

Apply and fly . . .

The true test is to run it on a non-single byte charset machine.


--pete

Brendan Eich [:brendan]

Comment 104

•

23 years ago

size_t is unsigned, and any size_t variable comparison < 0 will never be true
("degenerate unsigned comparison").

/be

Pete Collins (MDG)

Comment 105

•

23 years ago

Right i need to do:

if (nbytes == size_t(-1))

That aside, i think we can test out this code. I am blowing out my tree at the
moment so i will post a revised patch later.


--pete

Pete Collins (MDG)

Comment 106

•

23 years ago

Attached patch adding checks for empty string params (obsolete) — Details — Splinter Review

BTW: I forgot to mention, if you apply this patch you will need to run autoconf
and reconfigure.

--pete

Attachment #68860 - Attachment is obsolete: true

Alec Flett

Assignee

Comment 107

•

23 years ago

pete: but what about using MB_CUR_MAX? What if the system charset has 6
bytes-per-char (like UTF8)

Pete Collins (MDG)

Comment 108

•

23 years ago

Yea, the compiler warns when i use MB_CUR_MAX.

warning: ANSI C++ forbids variable-size array `tmpStr'

How about i just do something like this:

// set a 6 byte max size
#define MB_CHAR_MAX 6

so it is clear why we are using 6 here.

BTW: I thought the largest size of wchar_t on unix is 4 bytes . . . .

-pete

Alec Flett

Assignee

Comment 109

•

23 years ago

I just looked up why that was happening. The problem is that MB_CUR_MAX actually
depends on your current, runtime locale....on linux it is defined as
(__ctype_get_mb_cur_max ())

So maybe you can just make it very large (like 16) and NS_ASSERT that MB_CUR_MAX
> 16.. I just don't think that 4 is sufficient...you gotta hope that 16 would be
enough, though :) Besides, stack is cheap - there's no excess runtime cost to
using a 16-byte buffer vs. a 4 or 6 byte buffer.

Also, this is MB cur max, not WC cur max - this is the maximum number of bytes
used to REPRESENT a single wchar_t in the current, probably multi-byte charset...

Pete Collins (MDG)

Comment 110

•

23 years ago

Attached patch we should be getting close (obsolete) — Details — Splinter Review

Attachment #69601 - Attachment is obsolete: true

Alec Flett

Assignee

Comment 111

•

23 years ago

Comment on attachment 69733 [details] [diff] [review]
we should be getting close

a few more things:
1) in UCS2toFS, explain why we're doing the for loop to determine the wctomb
length
2) in the assertion, explain why this is bad, so if someone hits the assertion,
they understand why, and can at least guess at a fix
3) add some whitespace, and lots of it! :) There isn't a single line of
whitespace in any of these functions, and it it just makes it hard to read..
you need whitespace between every significant block of code.

4) ack! according to the man page, mbstowcs returns the number of CHARACTERS,
not the number of bytes.. which means you're probably only allocating half as
much as you want in FStoUCS2 - that should clean up this:
+    if (sizeof(PRUnichar) == sysWideCharSize) {
+	 wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes));
+    } else {
+	 wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes *
sizeof(PRUnichar)));
+    }


because we know that we want to allocate nchars * sizeof(sys_wchar_t), no
matter what.

Attachment #69733 - Flags: needs-work+

Brendan Eich [:brendan]

Comment 112

•

23 years ago

I hope the compilers we use can figure out that the if condition is a compile
time constant:

+    if (sizeof(PRUnichar) == sysWideCharSize) {
+        wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes));
+    } else {
+        wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes *
sizeof(PRUnichar)));
+    }

But, size_t sysWideCharSize should be const, not a variable.  I would rather you
used a #define, actually -- or just spell out the sizeof(sys_wchar_t), it's not
that long or ugly.

/be

Pete Collins (MDG)

Comment 113

•

23 years ago

Attached patch a little bit closer . . . (obsolete) — Details — Splinter Review

Attachment #69733 - Attachment is obsolete: true

Pete Collins (MDG)

Comment 114

•

23 years ago

Attached patch removed final size_t < 0 comparison (obsolete) — Details — Splinter Review

Attachment #70443 - Attachment is obsolete: true

Alec Flett

Assignee

Comment 115

•

23 years ago

Comment on attachment 70450 [details] [diff] [review]
removed final size_t < 0 comparison 

ok, I find the use of "nbytes" to continue to be a little confusing (would have
preferred "nchars") but to each his own :)
I'll give my tentative r=alecf here, under the assumption that brendan will be
the sr=

Attachment #70450 - Flags: review+

Brendan Eich [:brendan]

Comment 116

•

23 years ago

Comment on attachment 70450 [details] [diff] [review]
removed final size_t < 0 comparison 

>+    size_t nbytes = 1; 
>+NS_ASSERTION(MB_CUR_MAX <= 16, "this system has exceeded the allocated 16 byte mbchar max");

Nit: indent assertions as you would any other code -- if overlong line results,
break after , in actual param list.

>+
>+    /** 
>+     *  here we are determining the number of bytes processed in
>+     *  tmpStr and increment nbytes to properly allocate for str
>+     */

Nit: doc-comment in a .cpp file is wasted effort, I think.

>+static nsresult
>+FStoUCS2(const char* aBuffer, PRUnichar **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+
>+    if (nsCRT::strlen(aBuffer) == 0)
>+        return NS_OK;

Don't call nsCRT::strlen, it's going away soon.  Don't call strlen just to test
for an empty string, it can take too long on non-empty strings.  Just test if
*aBuffer == '\0'.

>+
>+    size_t nbytes = (nsCRT::strlen(aBuffer)+1) * sizeof(sys_wchar_t);

Nit: spaces around + ?

>+
>+    // add space for wchar null terminator
>+    nbytes += sizeof(sys_wchar_t);

Wait, didn't you just do that already with the +1 in the nbytes initializer
factor that's multiplied by sizeof(sys_wchar_t)?

>+    sys_wchar_t *wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(nbytes));
>+
>+    if (!wpath)
>+        return NS_ERROR_OUT_OF_MEMORY;
>+
>+    nbytes = mbstowcs(wpath, aBuffer, nbytes);
>+
>+    if (nbytes == size_t(-1)) {
>+        nsMemory::Free(wpath);
>+        return NS_ERROR_FAILURE;
>+    }
>+
>+    size_t len = 0;
>+    PRUnichar *wbuf;
>+
>+    // increment one for terminating null wide character
>+    ++nbytes;

Why not use + 1 in as you did earlier?	No point in ++'ing nbytes if you're
going to overwrite it here:

>+    nbytes = nbytes * sizeof(sys_wchar_t);

Do use *= if possible, but here I agree with alecf: the nbytes usage from
mbstowcs's call till here should be a new nchars var.

>+    wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc(nbytes));

You don't need to set nbytes if you do the add and multiply in the actual param
to Alloc.

alecf: some OSes use UTF-8, not our UCS-2 simple form, for filenames.  Isn't
Linux moving that way?

/be

Attachment #70450 - Flags: needs-work+

timeless

Comment 117

•

23 years ago

brendan, i think you're wrong about javadoc being useless in cpp files, 
http://unstable.elemental.com/mozilla/build/latest/mozilla/xpcom/dox/classnsDeq
ue.html is the url i'd watch, note that it says at the bottom:

The documentation for this class was generated from the following files:
         nsDeque.h
         nsDeque.cpp

I don't know how long it takes for unstable to update, but as of now it doesn't 
have the updated deque comments.

Brendan Eich [:brendan]

Comment 118

•

23 years ago

timeless: I wrote that doc comments in .cpp files were "wasted effort", which is
not to say "useless".  Not only will few people check
unstable.elemental.com/mozilla and read them, the particular one that pete wrote
seems to me just a plain old comment, not worthy of extraction out of context
into some kind of documentation.  It uses no @ keywords, defines no
interface(s), etc.  Let's please not start using /** ... */ everywhere a major
C-style comment is appropriate -- and pete, consider using // comments even for
multi-liners -- this is C++ after all.

/be

Pete Collins (MDG)

Comment 119

•

23 years ago

I should have time to clean up this patch tomorrow.

Brendan, i am being consistent w/ all of the other multi-line comments in
nsLocalFileUnix.cpp that i cleaned up way back when.

I favor this style, i find it easier to read and cleaner looking. 

To be honest, in a perfect world all multi-line cpp comments should be uniform.
This particular file is uniform.

--pete

Pete Collins (MDG)

Comment 120

•

23 years ago

Attached patch updated patch w/ Brendan's suggested changes (obsolete) — Details — Splinter Review

Attachment #70450 - Attachment is obsolete: true

Alec Flett

Assignee

Comment 121

•

23 years ago

Comment on attachment 70803 [details] [diff] [review]
updated patch w/ Brendan's suggested changes

transferring my r=alecf

Attachment #70803 - Flags: review+

Judson Valeski

Updated

•

23 years ago

Keywords: topembed

Pete Collins (MDG)

Comment 122

•

23 years ago

Perhaps since we have review and are now pending Brendans sr, it might be a good
time to apply this to a linux box setting the default system locale to JA.

--pete

Alec Flett

Assignee

Comment 123

•

23 years ago

*** Bug 41384 has been marked as a duplicate of this bug. ***

Asa Dotzler [:asa]

Updated

•

23 years ago

Keywords: mozilla1.0+

Alec Flett

Assignee

Comment 124

•

23 years ago

naoki, is there any chance you can test this patch on a japanese unix system
(Linux should be fine)

nhottanscp

Comment 125

•

23 years ago

I don't have JA system. I think IQA can test if the binary is available.

Pete Collins (MDG)

Comment 126

•

22 years ago

Ok, i will produce a linux binary soon.

Alecf, we want to land this for 1.0 right?


--pete

Alec Flett

Assignee

Comment 127

•

22 years ago

yes! we definitely want to land in 1.0

Doug Turner (:dougt)

Comment 128

•

22 years ago

Alec, do you have any API changes to either nsIFile or nsILocalFile, of so,
please cite and make this bug a blocker of  99160

No longer blocks: 66759, 117359

Alec Flett

Assignee

Comment 129

•

22 years ago

I don't, but don't remove those bug dependencies!

Blocks: 66759

Alec Flett

Assignee

Comment 130

•

22 years ago

where are we with this? I'm moving to 1.0 to keep it on the radar, but we've
really got to land this to get some time testing. brendan, we really need your
final review.

Target Milestone: mozilla0.9.9 → mozilla1.0

Pete Collins (MDG)

Comment 131

•

22 years ago

I haven't had time to build a linux binary for testing.
I'll try to do it today.


--pete

Brendan Eich [:brendan]

Comment 132

•

22 years ago

Pete, do you know what a doc-comment is?  This is one, and I do not think it
should be:

+    /**
+     *  here we are determining the number of bytes processed in
+     *  tmpStr and increment nbytes to properly allocate for str
+     */

Just remove the second * and it won't be a doc-comment (and you probably won't
want to have two spaces after the third, etc. stars).  Uniformity of comment
style is fine, but don't write doc-comments if you aren't writing extractable
javadoc.

/be

Brendan Eich [:brendan]

Comment 133

•

22 years ago

Comment on attachment 70803 [details] [diff] [review]
updated patch w/ Brendan's suggested changes

>@@ -1526,8 +1526,10 @@
> NS_NewLocalFile(const char *path, PRBool followSymlinks, nsILocalFile **result)
> {
>     nsLocalFile *file = new nsLocalFile();
>+
>     if (!file)
>         return NS_ERROR_OUT_OF_MEMORY;
>+
>     NS_ADDREF(file);
> 
>     if (path) {
>@@ -1537,6 +1539,302 @@
>             return rv;
>         }
>     }
>+
>     *result = file;
>+
>+    return NS_OK;
>+}

Speaking of uniformity, other functions below do not throw extra newlines
around such statements, and I don't see the point here.  Please minimize change
unless you're fixing some real readability problem (I don't see one here).

>+static nsresult
>+UCS2toFS(const PRUnichar *aBuffer, char **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+
>+    if (nsCRT::strlen(aBuffer) == 0)

I think I've said this before: don't call strlen to handle an empty string, it
wastes (possibly many) cycles on non-empty string cases where you need only
test (*aBuffer == '\0').

>+        return NS_OK;
>+
>+    sys_wchar_t *wpath;
>+    PRBool wpathWasAlloced = PR_FALSE;
>+
>+    if (sizeof(PRUnichar) == sizeof(sys_wchar_t)) {
>+        wpath = NS_CONST_CAST(sys_wchar_t *, (sys_wchar_t *)aBuffer);
>+    } else {
>+        // get the number of chars in aBuffer
>+        size_t bufLen = (nsCRT::strlen(aBuffer) + 1);
>+        // allocate number of chars * size of sys_wchar_t
>+        wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(bufLen * sizeof(sys_wchar_t)));
>+        wpathWasAlloced = PR_TRUE;

Now, I see a readability problem (minor, fix if you agree): why not add a
newline to make an empty line before the // allocate number of chars....

>+        /**
>+         *  copy each char from aBuffer into sys_wchar_t buffer
>+         *  so we can pass the system wchar_t to wcstombs
>+         */

Another doc-comment that should be a regular C-style major comment.

>+    if (wpathWasAlloced)
>+        nsMemory::Free(wpath);
>+
>+    if (nbytes == size_t(-1)) {
>+        nsMemory::Free(str);
>+        return NS_ERROR_FAILURE;
>+    }
>+
>+    *aResult = str;
>+
>+    return NS_OK;
>+}

I don't see the point in the extra newline before the final return, esp. in
light of earlier style.

>+
>+static nsresult
>+FStoUCS2(const char* aBuffer, PRUnichar **aResult)
>+{
>+    NS_ENSURE_ARG(aBuffer);
>+
>+    if (aBuffer == '\0')

Cool, but the same technique applies to the const PRUnichar* aBuffer case.

>+        return NS_OK;
>+
>+    size_t nbytes = (nsCRT::strlen(aBuffer) + 1) * sizeof(sys_wchar_t);
>+
>+    // add space for wchar null terminator
>+    // nbytes += sizeof(sys_wchar_t);
>+    sys_wchar_t *wpath = NS_REINTERPRET_CAST(sys_wchar_t *, nsMemory::Alloc(nbytes));
>+
>+    if (!wpath)
>+        return NS_ERROR_OUT_OF_MEMORY;
>+
>+    size_t nchars = mbstowcs(wpath, aBuffer, nbytes);
>+
>+    if (nchars == size_t(-1)) {
>+        nsMemory::Free(wpath);
>+        return NS_ERROR_FAILURE;
>+    }
>+
>+    size_t len = 0;
>+    PRUnichar *wbuf;
>+
>+    wbuf = NS_REINTERPRET_CAST(PRUnichar *, nsMemory::Alloc((nchars + 1) * sizeof(sys_wchar_t)));

Initialize wbuf at its declaration.

Test for out of memory (null wbuf after it's set) before using wbuf.

>+
>+    // copy each wchar_t to a PRUnichar
>+    while (wpath[len] != L'\0') {
>+        wbuf[len] = wpath[len];
>+        ++len;
>+    }
>+
>+    wbuf[len] = 0;
>+    nsMemory::Free(wpath);
>+    *aResult = wbuf;
>+
>+    if (!*aResult)
>+        return NS_ERROR_OUT_OF_MEMORY;

Too late, the code above (any use or set of wbuf[len]) would have crashed on
null wbuf, if OOM.

>+
>+    return NS_OK;
>+}
>+
>+NS_IMETHODIMP
>+nsLocalFile::InitWithUnicodePath(const PRUnichar *filePath)
>+{
>+
>+    NS_ENSURE_ARG(filePath);
>+
>+    if (nsCRT::strlen(filePath) == 0)

Here we go again... :-(.  Please scour the code for all such bogus strlen
calls.

>+        return NS_ERROR_INVALID_ARG;
>+
>+    nsXPIDLCString tmp;
>+    nsresult rv = UCS2toFS(filePath, getter_Copies(tmp));
>+
>+    if (NS_SUCCEEDED(rv))
>+        return InitWithPath(tmp);
>+
>+    return rv;

Don't test success and overindent the normal case.  Use NS_FAILED to decide to
early-return rv, then fall into a least-indented call forwarding to the
InitWithPath method.

>+}
>+
>+NS_IMETHODIMP
>+nsLocalFile::AppendUnicode(const PRUnichar *node)
>+{
>+    NS_ENSURE_ARG(node);
>+
>+    if (nsCRT::strlen(node) == 0)

Ditto.

>+        return NS_OK;
>+
>+    nsXPIDLCString tmp;
>+    nsresult rv = UCS2toFS(node, getter_Copies(tmp));
>+
>+    if (NS_SUCCEEDED(rv))
>+        return Append(tmp);
>+
>+    return rv;

Ditto2.

>+}
>+
>+NS_IMETHODIMP
>+nsLocalFile::AppendRelativeUnicodePath(const PRUnichar *node)
>+{
>+    NS_ENSURE_ARG(node);
>+
>+    if (nsCRT::strlen(node) == 0)

etc. -- I'll let you find the rest.

>+        return NS_OK;
>+
>+    nsXPIDLCString tmp;
>+
>+    if (tmp.IsEmpty())
>+        return NS_OK;
>+
>+    nsresult rv = UCS2toFS(node, getter_Copies(tmp));
>+
>+    if (NS_SUCCEEDED(rv))
>+        return AppendRelativePath(tmp);
>+
>+    return rv;

etc.2

>+}

More below this, not individually cited here.

/be

Attachment #70803 - Flags: needs-work+

Brendan Eich [:brendan]

Comment 134

•

22 years ago

>>+static nsresult
>>+FStoUCS2(const char* aBuffer, PRUnichar **aResult)
>>+{
>>+    NS_ENSURE_ARG(aBuffer);
>>+
>>+    if (aBuffer == '\0')
>
>Cool, but the same technique applies to the const PRUnichar* aBuffer case.

Uncool -- shaver pointed out what my bleary eyes missed: no * before aBuffer,
making this a test of whether aBuffer is null!  Please fix that, and notice any
warnings (there should have been one here, I think -- your compiler mileage may
vary).

/be

Conrad Carlen (not reading bugmail)

Comment 135

•

22 years ago

CCing mkaply for OS2. This is in on Windows & Mac, almost there on Unix. Once we
have OS/2, the job is done.

Darin Fisher

Comment 136

•

22 years ago

the man pages (redhat 7.2) says that wcstombs and mbstowcs are not thread safe.
 this patch needs to use a PRLock to protect libc.  the man pages also make
reference to wcsrtombs and mbsrtowcs, which are reentrant versions of these
functions.

Darin Fisher

Comment 137

•

22 years ago

the current impl is also not threadsafe... we're probably just lucky right now
that no one is calling any of the unicode string methods on background threads.

if we switch to an UTF-8 API then these conversion methods will be run much more
often and most certainly from background threads (e.g., the file transport thread).

Darin Fisher

Comment 138

•

22 years ago

oh, and if we're going to put a lock around these calls, then why not allocate a
single global buffer of size (MAX_PATH * sizeof(wchar_t)) for this?

Daniel Veditz [:dveditz]

Comment 139

•

22 years ago

XPInstall runs on a separate thread and obviously does a lot of file stuff. What
conditions should I be testing to see if we are, in fact, in danger of threading
problems?

Darin Fisher

Comment 140

•

22 years ago

dveditz: if xpinstall calls any of the unicode string methods on a background
thread, you'd be in trouble.

Daniel Veditz [:dveditz]

Comment 141

•

22 years ago

The threading issue is on Unix only, right? The Windows code looks safe enough.

XPInstall currently calls the narrow nsILocalFile methods, but we've got
localization problems and were going to be converting to the Unicode forms. Are
the threading issues going to get fixed?

Darin Fisher

Comment 142

•

22 years ago

the threading issues must be fixed... also, it looks like OS/2 uses
nsLocalFileUnicode, so it'll need fixing as well... or OS/2 will need its own
customized unicode <-> fscharset conversion code.

Cathleen

Comment 143

•

22 years ago

topembed-
this is good for stand alone xpcom.
- by triage team (chofmann, cathleen, marcia)

Keywords: topembed → embed, topembed-

Mike Kaply [:mkaply]

Comment 144

•

22 years ago

So you want me to use operating system APIs to do my Unicode conversions in 
xpcom?

I would prefer not to. As it stands I was going to remove my use of operating 
system APIs in widget and gfx and switch to using Mozilla to get consistent 
conversions.

If we use OS APIs in some places and Mozilla APIs in others, you can end up with 
inconsistent conversion of Unicode, in particular with Japanese.

If you want me to use my OS APIs, I need someone to help me architect the right 
place to put them because right now I have custom unicode conversion functions 
in widget and gfx and I would prefer not to add a third implementation into 
xpcom (we don't have simple APIs to do this, we have to write some functions) - 
see:

http://lxr.mozilla.org/seamonkey/source/gfx/src/os2/nsGfxDefs.cpp#164

and

http://lxr.mozilla.org/seamonkey/source/gfx/src/os2/nsGfxDefs.cpp#242

Pete Collins (MDG)

Comment 145

•

22 years ago

I'll have an updated patch by weeks end.

--pete

Pete Collins (MDG)

Comment 146

•

22 years ago

The compiler flag -fshort-wchar causes wcstombs to fail every time on linux.
If i remove it, it works fine. 

Seawood, what do you suggest we do here? This is an autoconf problem.

--pete

hacker formerly known as seawood@netscape.com

Comment 147

•

22 years ago

I'm going to defer to Akkana & dbaron as to the importance of using
-fshort-wchar.  This doesn't affect our release builds as egcs 1.1.2 doesn't
support the flag but we may want to just beef up the autoconf test with an
explicit wcstombs test.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 148

•

22 years ago

-fshort-wchar is what allows NS_LITERAL_STRING to do its work at compile time
rather than run time.  That's becoming more and more important for performance
as we use it more.

(I'm not sure why a system API using wchar_t would be useful anyway when wchar_t
is 32-bit and PRUnichar is 16-bit, but I haven't read the whole bug.)

Darin Fisher

Comment 149

•

22 years ago

so i was curious about this wchar_t thing and i tried a little test program with
a call to mbstowcs.  i compiled the program with and without -fshort-wchar, and
in both cases it worked correctly.

here's the sample program:

  #include <wchar.h>
  #include <stdio.h>

  int main(int argc, char **argv)
  {
    wchar_t buf[256], *p;
    size_t rv;

    printf("sizeof(wchar_t) = %u\n", sizeof(wchar_t));

    rv = mbstowcs(buf, argv[1], sizeof(buf));

    p = buf;
    while (*p) {
      printf("0x%08x\n", (unsigned int) *p);
      p++;
    }
    return 0;
  }

so, under linux at least it looks like we could happily set -fshort-wchar and
still call the system wchar API.  how'd they do that anyways?! ;-)

Pete Collins (MDG)

Comment 150

•

22 years ago

So now you're gonna to try and make a lier out of me eh tough guy?

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>

int
main ()
{
  return printf("nbytes = %d\n", wcstombs(0, L"test", 0));
}


# compiled with -fshort-wchar

$ ./a.out 
nbytes = -1

 compiled w/out -fshort-wchar

$ ./a.out 
nbytes = 4


$ uname -a
Linux LinuxBox 2.4.9-13SGI_XFS_1.0.2 #1 Thu Nov 15 11:38:50 CST 2001 i686 unknown

--pete

Darin Fisher

Comment 151

•

22 years ago

pete: i'm running glibc 2.2.4 that comes with redhat 7.2... what version of
glibc are you testing?  (maybe it's a new feature of glibc)

Darin Fisher

Comment 152

•

22 years ago

pete: actually, i just tried your example, and i get the same results?!  now i'm
baffled by the results from my test and yours.

Pete Collins (MDG)

Comment 153

•

22 years ago

Darin, you tested 'mbstowcs' *not* 'wcstombs' which is the method I said was
failing.

The sticky issue is, what is the best way to deal w/ this?

Seems there is certain love for -fshort-wchar and these system API's break when
you are compiling w/ it.

A simple solution is, don't use -fshort-wchar 
But this is mozilla, and simple soultions aren't allowed.  ;-)

So i guess it comes down to what is more important. Breaking xpcom uconv
dependency for unix or the perf benifit of compiling w/ this flag on the unix
systems that actually support it.

--pete

Darin Fisher

Comment 154

•

22 years ago

pete: right, but wouldn't you expect wcstombs and mbstowcs to treat wchar_t
similarly?  that's what is so odd about this.

at any rate, once we figure out what to look for we should be able to come up
with some autoconf tests that'll tell us if we need to convert to UCS4 before
calling wcstombs, etc.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 155

•

22 years ago

Another simple solution would be to have configure figure out the size of the
type that wcstombs takes (perhaps just by defining SYS_WCHAR_T to be |wchar_t|,
unless we use -fshort-wchar, in which case you could make it |unsigned int|). 
(Note that on some Unix-ish platforms, e.g., AIX, wchar_t is 2-byte by default
-- see bug 54564 comment 32 and bug 54564 comment 34.)

Darin Fisher

Comment 156

•

22 years ago

ok, i give up... i can't reproduce my results from yesterday.  this is so wierd!

Pete Collins (MDG)

Comment 157

•

22 years ago

Yea, i'm working on dbarons idea. Trouble is, wcstombs says it wants 4 bytes but
when i give it a 4 byte unsigned int, it still fails when using the flag.

--pete

Pete Collins (MDG)

Comment 158

•

22 years ago

Ok, can't get wcstombs to work using -fshort-wchar no matter what . . 

However, wctomb works "fine and dandy" w/ or w/out the flag.

So i will change the current implementation to use this method instead.

Then, we'll see what else breaks . . .

** One time, 1978. August. For about an hour. I was both fine and dandy at the
same time. But nobody asked me how I was. I coulda told 'em, "Fine and dandy!" I
consider it a lost opportunity. **  -- george carlin


--pete

Darin Fisher

Comment 159

•

22 years ago

Attached file conversion routines using wcrtomb/mbrtowc (not a complete solution) (obsolete) — Details

so, my patch for bug 129279, requires that this bug be fixed.  so, i hacked
together some conversion routines that work under glibc2.2.4... there's some
work that would have to be done to make this portable.	namely:

1- i'm using the reentrant versions of wctomb/mbtowc which probably aren't
universally available.	if not available, then we need to protect the
conversions functions with a mutex.

2- i'm assuming wchar_t is sized correctly for use with wcrtomb/mbrtowc, which
is not true if compiled using -fshort-wchar.  we'd need some autoconf lovin to
figure out what the correct data type is.

3- i'm making the assumption that file paths in native encoding won't exceed
PATH_MAX bytes.  is this valid?  if so, it allows us to avoid a heap
allocation.

Darin Fisher

Comment 160

•

22 years ago

actually, if my assumptions about PATH_MAX are correct, then the first part of
convert_ucs2_to_native can go away.  there's no need to precalculate the length
of the native string.

Mike Kaply [:mkaply]

Comment 161

•

22 years ago

Attached patch nsLocalFileOS2 fixes (obsolete) — Details — Splinter Review

This is an OS/2 version.

It's essentially the Unix version with new versions of UCS2toFS and FStoUCS2.

I also had to hook startup and shutdown of nsLocalFile so I only have to create
the converters once.

Pete Collins (MDG)

Comment 162

•

22 years ago

Another good one.

int
main ()
{

  char *aBuffer = "test";
  wchar_t wpath[5];
  mbstowcs(wpath, aBuffer, strlen(aBuffer) + 1);

  printf("wpath[0] %c\n", wpath[0]);
  printf("wpath[1] %c\n", wpath[1]);
  printf("wpath[2] %c\n", wpath[2]);
  printf("wpath[3] %c\n", wpath[3]);
  
  return 0;
}


# with -fshort-wchar

$ ./a.out 
wpath[0] t
wpath[1] 
wpath[2] e
wpath[3] 

# w/out -fshort-wchar

$ ./a.out 
wpath[0] t
wpath[1] e
wpath[2] s
wpath[3] t

Alec Flett

Assignee

Comment 163

•

22 years ago

Comment on attachment 77533 [details] [diff] [review]
nsLocalFileOS2 fixes

is there any extra error recovery you can do if you find that the worst-case
memory allocation isn't enough? 
I figure *8 should give you PLEANTY but I figured I'd check.
And actually, when going from native->UCS2, it seems like the number of UCS2
characters should always be < the number of native characters, no? so the *4
isn't really necessary? Or maybe I'm overlooking something :)

sr=alecf with or without the *4

Attachment #77533 - Flags: superreview+

Pete Collins (MDG)

Comment 164

•

22 years ago

Attached patch works on linux and freebsd (obsolete) — Details — Splinter Review

This patch actually works on linux and FreeBSD.  -f-short-wchar or not

I don't have any time to clean it up. 

Darin or Alec, please feel free to scrub it some, otherwise it will have to
wait until some time this weekend.

--pete

Pete Collins (MDG)

Comment 165

•

22 years ago

Attached patch tested and cleaned up (obsolete) — Details — Splinter Review

I'll take some review on this patch

--pete

Pete Collins (MDG)

Comment 166

•

22 years ago

Attached patch cleaned up a few more things (obsolete) — Details — Splinter Review

Attachment #70803 - Attachment is obsolete: true

Attachment #77494 - Attachment is obsolete: true

Attachment #77649 - Attachment is obsolete: true

Attachment #77774 - Attachment is obsolete: true

Pete Collins (MDG)

Comment 167

•

22 years ago

Attached patch forgot the most important part of the patch, to remove 'uconv' dependency from Makefile — Details — Splinter Review

This should be solid enough for a review

--pete

Attachment #77778 - Attachment is obsolete: true

Mike Kaply [:mkaply]

Comment 168

•

22 years ago

Hmm. I made a comment that never showed.

My *8 is overkill, should be times 4.

The new GB codepage can have 4 bytes representing a native character, so worst 
case UCS2 to Native is 4 bytes per one unicode char.

My understanding is that there are some UCS2 chars that are 4 bytes, so native 
to UCS2 would be a 4 byte worst case.

I can add some error checking in there though.

Darin Fisher

Comment 169

•

22 years ago

pete: your patch is not thread safe.  we should either use the reentrant forms
of these functions (e.g., wcrtomb and mbrtowc) or protect the wctomb/mbtowc
calls with a PRLock.  also, there's no need to precalculate the length of the
resulting string when converting from wide to narrow.  you can just push a
PATH_MAX sized buffer on the stack, fill the stack buffer with repeated calls to
wcr?tomb and then finish off with a call to strdup.

Pete Collins (MDG)

Comment 170

•

22 years ago

 I was under the impression that none of the system calls are . . .
What good is wrapping these calls w/ PRLock and leaving stat and the other calls
alone?

NS_IMPL_THREADSAFE_ISUPPORTS2

What does this do? 

--pete

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 171

•

22 years ago

System calls (calls that request services from the kernel) are all threadsafe,
everywhere I've ever seen threads.  Don't worry yourself about system calls.

wctomb and mbtowc are library calls, not system calls, and are indicated in my
man page as not being multi-thread safe.  Darin: using a PRLock won't be enough,
I suspect: what if we're racing with other uses of wctomb in embedder/component
code, rather than just in this local file code?  We need wcrtomb/mbrtowc, likely.

NS_IMPL_THREADSAFE_ISUPPORTS* provide threadsafe implementations of the basic
nsISupports methods (AddRef and Release; QI doesn't have any state) through use
of the NSPR atomic increment/decrement primitives.

Darin Fisher

Comment 172

•

22 years ago

shaver: yeah, good point... a lock wouldn't help us :(  do all platforms (that
we care about) provide the reentrant forms of these functions?  i was afraid
that some might not.  others, might use thread local storage to solve the
problem.  that's what a number of UNIX platforms do to make gethostbyname (for
example) reentrant.  wtc?

Pete Collins (MDG)

Comment 173

•

22 years ago

What about an autoconf test?

HAVE_REENTRANT_WC_API 

or something . . .

Systems that have the reentrant conversion api's use them. Systems that don't,
don't use them.

Or would Pthreads work here?

--pete

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 174

•

22 years ago

In the case where we don't have a reentrant wctomb/mbtowc, we probably need to
use our own with lock-wrappers _everywhere_ in Mozilla, and warn
embedders/extenders that they should use it as well.  (glue library?)

I don't see how pthreads help us -- what do you have in mind?

Darin Fisher

Comment 175

•

22 years ago

FWIW: i did a little survey of tinderbox-ports: Sun and Linux boxes provide
wcrtomb/mbrtowc, whereas IRIX and HPUX boxes don't.  i didn't check AIX.

Wan-Teh Chang

Comment 176

•

22 years ago

Darin,

I am not familiar with wctomb/mbtowc, so I don't know if the
platforms we care about provide reentrant forms of these
functions.  Sorry.

Pete Collins (MDG)

Comment 177

•

22 years ago

Let me try and understand the situation better.

We wrap these libc conversion calls w/ pthread_create and things are cool as long
as the embedder doesn't make a seperate raw call to one of these API's in our
process space.

Is this correct?

--pete

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 178

•

22 years ago

Uh.

When you say "wrap these calls with pthread_create", do you mean "explicitly
create a thread for each call to wctomb, and then somehow synchronize with that
thread later, so that we know the conversion is done"?  I can't understand how
that would help, but I can't come up with an alternate interpretation for that.

Beside the fact that we don't want to use raw pthread-API calls if we can avoid
them, _adding_ threads to the system seems a very odd way to deal with a lack of
threadsafety.

What am I missing?

Darin Fisher

Comment 179

•

22 years ago

no, not pthread_create.  we want to protect the calls to wctomb/mbtowc with a
PRLock.  see PR_NewLock, PR_DestroyLock, nsAutoLock.  then provided no one calls
wctomb/mbtowc outside of these locks, we're ok.  to do it right, we should
really move this conversion stuff to some service or extern libxpcom methods, so
other components requiring wctomb/mbtowc would be able to use the methods
safely. NSPR!

Pete Collins (MDG)

Comment 180

•

22 years ago

Agree, what do wide char conversion routines have to do w/ nsLocalFile class?
Sounds like a plan.  Someone will have to facilitate this. 
I provided a patch, tested and working. I don't have the time to get sucked in
any deeper right now. 
I have two boxes here running smoothly w/out xpcom uconv dependency.

"Pretty please w/ sugar on top,  . . . "

--pete

Pete Collins (MDG)

Comment 181

•

22 years ago

What do you want to do with this guys? 

Mark 1.01?

I assume the win and mac patches that already landed are "thread safe" right?

--pete

Mike Kaply [:mkaply]

Comment 182

•

22 years ago

Should I try to get approval and get the OS/2 fixes in now or does it matter?

Darin Fisher

Comment 183

•

22 years ago

this needs to go in for mozilla 1.0... bug 129279 depends on this bug.

Depends on: 129279

Pete Collins (MDG)

Comment 184

•

22 years ago

Well, someone needs to advise here. What is it going to take to land this sucker
in the near term?

--pete

Darin Fisher

Comment 185

•

22 years ago

i can take care of landing the XP_UNIX part when i land my changes for bug
129279. that won't be a problem.  and then for OS/2, it's really up to whatever
mkaply wants to do.

Mike Kaply [:mkaply]

Comment 186

•

22 years ago

My change can land any time. I just need a reviewer.

Mike Kaply [:mkaply]

Comment 187

•

22 years ago

Attached patch OS/2 only patch (obsolete) — Details — Splinter Review

OK, this should be the final OS/2 fix.

I forgot that on OS/2, filenames are limited to CCHMAXPATH(260) anyway, so why
use really big buffers, use CCHMAXPATH.

I also added asserts just in case things are too big.

Attachment #77533 - Attachment is obsolete: true

jhp (no longer active)

Comment 188

•

22 years ago

Comment on attachment 79452 [details] [diff] [review]
OS/2 only patch

r=pedemont

Attachment #79452 - Flags: review+

Alec Flett

Assignee

Comment 189

•

22 years ago

Comment on attachment 79452 [details] [diff] [review]
OS/2 only patch

doh! Just noticed this:

>+NS_IMETHODIMP
>+nsLocalFile::AppendRelativeUnicodePath(const PRUnichar *node)
>+{
>+    NS_ENSURE_ARG(node);
>+
>+    if (nsCRT::strlen(node) == 0)
>+        return NS_OK;
>+
>+    nsXPIDLCString tmp;
>+
>+    if (tmp.IsEmpty())
>+        return NS_OK;
>+


and I was about to sr= too! :)

Attachment #79452 - Flags: needs-work+

Mike Kaply [:mkaply]

Comment 190

•

22 years ago

Attached patch New patch — Details — Splinter Review

I have no idea where that came from.

The only OS/2 specific stuff in here is the UCStoFS and FStoUCS stuff.

The rest of the APIs I block copied from nsLocalFileUnix.

Attachment #79452 - Attachment is obsolete: true

Alec Flett

Assignee

Comment 191

•

22 years ago

Comment on attachment 79764 [details] [diff] [review]
New patch

excellent. Then sr=alecf

Attachment #79764 - Flags: superreview+

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 192

•

22 years ago

Comment on attachment 79764 [details] [diff] [review]
New patch

a=shaver for 1.0 branch.

Attachment #79764 - Flags: approval+

Alec Flett

Assignee

Comment 193

•

22 years ago

Attached patch remove "uconv" from Makefile.in — Details — Splinter Review

only thing left was the removal of "uconv" from the Makefile.in - anyone want
to review? this is the last nasty dependency from xpcom! Thanks darin!

Chris McAfee

Comment 194

•

22 years ago

Comment on attachment 81718 [details] [diff] [review]
remove "uconv" from Makefile.in

r=mcafee, make sure this builds & stuff

Attachment #81718 - Flags: review+

Darin Fisher

Comment 195

•

22 years ago

Comment on attachment 81718 [details] [diff] [review]
remove "uconv" from Makefile.in

sr=darin

Attachment #81718 - Flags: superreview+

Alec Flett

Assignee

Comment 196

•

22 years ago

yay! marking this fixed.. thanks to:
darin and pete for unix work
conrad for mac work
mkaply for os/2 work
me for windows work :)

Status: ASSIGNED → RESOLVED

Closed: 23 years ago → 22 years ago

Resolution: --- → FIXED

Asa Dotzler [:asa]

Comment 197

•

22 years ago

adding fixed1.0.0 keyword (branch resolution). This bug has comments saying it
was fixed on the 1.0 branch and a bonsai checkin comment that agrees. To verify
the bug has been fixed on the 1.0 branch please replace the fixed1.0.0 keyword
with verified1.0.0.

Keywords: fixed1.0.0

You need to log in before you can comment on or make changes to this bug.