Closed Bug 147333 Opened 22 years ago Closed 22 years ago

Cannot load local files whose names contain Japanese/Chinese characters

Categories

(Core :: Internationalization, defect, P1)

Sun
Solaris
defect

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: masaki.katakai, Assigned: darin.moz)

References

Details

(Keywords: intl, regression, Whiteboard: [adt3 RTM] [ETA 06/21])

Attachments

(9 files, 8 obsolete files)

1.78 KB, text/plain
Details
3.83 KB, text/plain
Details
49.04 KB, patch
Details | Diff | Splinter Review
416.01 KB, patch
Details | Diff | Splinter Review
38.32 KB, patch
darin.moz
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
27.81 KB, text/plain
Details
1.86 KB, text/plain
Details
2.81 KB, patch
alecf
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
1.53 KB, text/plain
Details
This is regression. Try to browse japanese EUC file on file
picker on Solaris.

1. Start CDE in Japanese EUC locale
2. Prepare a file with filename = Japanese EUC
3. Start Mozilla
4. File -> Open File...
   filename becomes ????.txt

This works on RC1, RC2 but does not work on RC3.

This problem happens on only Solaris platform. I could not
find this on Linux (RH6.2JA).
This problem also happens at directory browsing when
I type "file://...." on URL field.
Sounds like fallout from the nsIFile api changes...
upgrading to blocker also; this should block 1.0 if nothing else, imo, and this 
is the only way I know to get people's attention on it.
Severity: critical → blocker
Keywords: regression
This works on UTF-8 locale but doesn't work on
Japanese EUC locale.

It seems that now mbtowc() is used for code
conversion for native<->ucs2 but if I remember
correctly, this works on only Linux platform.
How about another UNIX? AIX and HP-UX?

For Solaris, we need to use iconv() of Solaris
for the conversion or need to continue to use
Mozilla own converters here.

Keywords: intl
QA Contact: ruixu → ylong
cc dougt
darin, any ideas?
what version of solaris does this apply to?  i'm very surprised to hear that
mbtowc fails... is that really a common problem on solaris?
katakai: can you please give us some help? 
Ervin: 
can you help to test and work on this feature using Chineses locale? this
regression bug is expected to be fixed in Mozilla1.0. thanks
Jay
anybody tested in HP-UX/AIX/SGI?
It is also failed on Solaris 9 CDE chinese env.
Anyone can narrow down the suspicious patches?
My guess is that newer Unix systems have wide-char defined as unicode but older
systems do not.

What is the OS version on the systems that are seeing the problem?
widechar in Solaris is locale specific, usually not UCS2, so mbtowc() shouldn't
be used as the converter from native to UCS2, only iconv() or netscape specific
converter can be used in Solaris.
I suspect that the patch for this will not show up in the mozilla 1.0 tree
since we are about to tag the tree for 1.0 tonight and it doesn't seem that a 
well tested fix for this is likely to show up for this tonight.

However, it will be a few days before we actually release so if you can 
come up with a patch before the release date then we can release note
the patch and make sure that whoever does the solaris builds applies that patch
so that the mozilla 1.0 solaris binaries on ftp.mozilla.org will work.
perhps Dawn is right :-(

could any encoding guru help to see the patch for bug 129279?  
i can reproduce this bug with RC3,but rc2 is fine
I can reproduce this bug in ko/zh/zh_TW locales on Solaris with RC3.
but RC2 is OK. also this bug happen on Netscape 7.
hmm.. what version of netscape 7 are you testing?  netscape 7 PR1 does not
include the patch for bug 129279.  only builds after RC2 include that patch.

anyways, i can easily put together a patch for this, but i'll need someone to
test it out.  anyone here willing to be my patch buddy?

looks like redhat 7.3 (glibc-2.2.5) supports the iconv interface, so i'll go
ahead and develop a patch for this that works under linux.  it might even be
better to prefer iconv over mb[r]towc/wc[r]tomb in general... i'm not sure yet.

taking...
Assignee: yokoyama → darin
darin, I'll be glad to test your patch.
I believe the problem is assuming that mbtowc() and related functions return
Unicode.  Both the length and encoding of wchar_t are not specified in
ANSI/ISO C --  they are implementation defined.

On Solaris (and probably most of the older Unix implementations) wchar_t
is probabably NOT Unicode for all locales.

I think Katakai-san is correct that the cross-Unix solution should be to use
iconv().  I'm not sure if iconv() is supported on other platforms.

If my memory servers me correctly, the original concept was that wchar_t was
intended to be only an internal opaque representation of character data that
represents each character in a fixed length number of bytes.  That is why
ANSI/ISO C left the size and encoding of wchar_t implementation defined.

Using wchar_t can cause problems of portability and interoperability because
the size and encoding can differ across platforms and even compilers on the
same platform.  If wchar_t is used externally you can hit big-endian v.
little-endian incompatibility as well.
Ignore last comment about iconv() on non-Unix platforms.
Looking at the patch it appears that mbtowc() is only used in nsLocalFileUnix.cpp,
so we don't have to worry about using iconv() on non-Unices.
I believe mozilla has all the converters necessary. My recollection is that the
last time I talked to ftang about iconv he felt there were valid reason why 
we should not use them.
Can someone point to some good examples of code calling the internal
mozilla converters that would be used to replace the mbtowc() calls?
hi darin,if you need help,please call me
ok, i did some investigation... and it looks like iconv is supported as far back
as redhat 6.0, so we are fine using that for linux.  Mac OSX also provides
iconv, so that pretty much implies that *BSD systems should be covered.  i
suspect most unix platforms support iconv, so my patch will use iconv if
available and fall back on the existing implementation otherwise.

bstell: regarding using mozilla's uconv libraries: that is not an option. 
libxpcom cannot depend on any of mozilla's uconv libraries (i forget the bugs
off hand, but installer has intl problems otherwise).

i should have a patch by end of day tomorrow.
Status: NEW → ASSIGNED
Keywords: mozilla1.0, nsbeta1
Priority: -- → P1
Whiteboard: [adt1 RTM]
Target Milestone: --- → mozilla1.0.1
While I know you have immediate and pressing problems due to a last minute
addition of a major amount of code I would hope you have got 'buy-in' from
the i18n developers and maintainers. 

The choice to use iconv is a major architectural decision and I would hope we 
make these kinds of decisions carefully and not as a side effect of a fire 
fighting drill.
Brian, 
xpcom does not want a dependency on i18n.  We need to be able to go from unicode
to fs charset and back.  The system that xpcom runs on provides these
conversions.  Why should we use the i18n libraries?  Where is the benefit?  
Dougt: wouldn't you consider it odd if a someone decided to link in a new
3rd party html parser because they had issues with the mozilla html parser?

Shouldn't we make architectural decisions carefully?

Are you or darin planning to own all problems related to iconv such as
character conversions that differ from the i18n converters already in the
mozilla i18n code?

It seems a very high risk choice to include the nsiFile changes to very late
in the cycle. I do not see why this choice should make us give up on careful
architecture.
Has anyone done any *actual* testing to find out if iconv will work on all
platforms?

This very problem seems to be here because no one did the minimal reasonable 
testing before checkin to see if all systems had a problem.

Lets be careful and not repeat the same mistake.

I have a experience that I was trying to use iconv() for code
conversion in my program (XIM input method server). The codes
needed to be shared in Linux and Solaris. I was seeing some
issues and finally gave up to use iconv() on linux platform().
I'm sorry I could not remember the problems correctly... but

- iconv() on linux depends on its version
- iconv() on linux always uses big-endian (??)
- but Solaris iconv() uses big-endian on SPARC
  little-endian on x86
- need to check BE and LE iconv capability

It would be OK if we could focus on the specific platform, and
if we could check the target platform and iconv() capability
properly. But it's too difficult, I think. I think we will need
to use many ifdef ..PLATFORM.. or ifdef ..ICONV_VERSION... when
we use iconv(). (but I'm sorry I don't know the issues about iconv()
exactly)

Also, it would be good if we could find other multi-platform
products/opensourced application that use iconv(). Does
anyone know such products? Does the product support all
platform of Mozilla? If exists, that is good our example.
slow down, brian.  

>Dougt: wouldn't you consider it odd if a someone decided to link in a new
3rd party html parser because they had issues with the mozilla html parser?

I do not follow.  Please explain why switching the html parser out has anything
to do with the unicode->fs charset converter.

>Shouldn't we make architectural decisions carefully?

Are you seriously asking this question?  

>Are you or darin planning to own all problems related to iconv such as
character conversions that differ from the i18n converters already in the
mozilla i18n code?

Of course not.  

>It seems a very high risk choice to include the nsiFile changes to very late
in the cycle. 

What do *you* propose, brian?  Hand waving without an alternative just sucks. 
Give me an alternative for solaris, bsd, which doesnt pull in all of the
dependencies that i19n has!

>I do not see why this choice should make us give up on careful architecture.

You really can not be serious!  Show me one paper explaining this
"architecture"!  The "way it is now" doesn't mean anything was carefully designed?  

>Has anyone done any *actual* testing to find out if iconv will work on all
platforms?

Considering there isn't even a patch, i would think NO.

>This very problem seems to be here because no one did the minimal reasonable 
testing before checkin to see if all systems had a problem.

What is done is done.  We thought we did the minimal reasonable testing.  We
could have tested forever, but we were under the a large amount of pressure to
freeze API's and this was part of it.  Now with that said: Where the heck were
you?  Announcements were made, public builds were posted, you could have taked
the patch and helped us?  Ah.  It is far easier pointing out the obvious.

>Lets be careful and not repeat the same mistake.

Again, offer a suggestion which doesn't make you sound like a complete ass.  Do
you think that we want to make a similar mistake?  
what Masaki said.  lets special case the iconv on platforms which use the short
bus version of mbtowc.
my take on this is as follows:

1- we had to remove the libxpcom dependency on mozilla's uconv libraries.

2- we chose to use mb[r]towc/wc[r]tomb thinking (incorrectly) that these
converted between the native multibyte character set and unicode (UCS-4 in some
cases).  despite testing on a variety of different unix platforms, we were
unfortunately wrong about this.  (yes, i wish we had solicited help from those
who would have spotted this problem immediately, but i guess time pressures got
the best of us.  now that it has been determined that 1.0 will ship with this
bug, we can fortunately take our time and fix this bug carefully.)

3- iconv seems to be the only option on platforms that don't use unicode for
wide chars (internally).

of course, we're going to have to do a lot of testing.  i'm hoping that since we
are only interested in conversion to/from UCS-2 that there won't be too much
variation from platform to platform.  endian-ness definitely worries me.  i'd
like to think that iconv would interpret "UCS-2" to mean "UCS-2" using the
native endian-ness.. but perhaps that is just wishful thinking :/  ..remains to
be tested i suppose.
> Please explain why switching the html parser out has anything
> to do with the unicode->fs charset converter.

It is a comparison. Using iconv because the code for nsIFile was not
tested is a major architectural change just as using a 3rd party html
parser would be. I am very serious about this.

As to what we should do I strongly think we should not use iconv unless
someone who regularly works with the converters can state that this is
a safe and reliable thing to do.

I think perhaps due to the very late date we should do *nothing* right now
until we have figure out something we can live with. I recommend we just
release note this pending a plan.

Being in a (self imposed) fire drill does not mean we should give up on
careful architecture.

well, for architectural, and embedding reasons, we cannot use the i18n libraries
in nsIFile, period. it's just not an option. Put simply there were
initialization issues in XPCOM (i.e. initialization of XPCOM itself) that
required nsIFile access before the i18n libraries had been loaded. Darin
explored the option pretty extensively and I hope we don't have to argue that
point again. 

Thus we have to rely on an os-specific mechanism for doing the conversion. It
sounds like the real issue is that the trunk-baked mbstowcs/etc provides a 90%
solution for 1.0, and iconv(), while it might provide a 100% solution, has not
been tested.

Let's just wait for an iconv patch and bake it on the trunk while we wait for
1.0 to ship.. if it looks good then we can land it on the 1.0.1 branch when it
is created.
Please keep the install folks in the loop on any changes in the file implementation.
The choice to use iconv is an architectural decision.

Okay, we have problems using code that uses our own xpcom interfaces. That does
not imply the iconv is the answer.

Could anyone show me any discussions/emails/white-papers/studies on the subject
of using iconv in mozilla?

Who will own the iconv code and be responsible for its maintenance?

Right now based on what I see the decision to use iconv feels to me like very 
sloppy work. I will gladly retract that if anyone can show me the prepatory work 
that seems reasonable when making architectural decisions.
> Okay, we have problems using code that uses our own xpcom interfaces. That does
not imply the iconv is the answer.

This is not what anyone is directy saying, brian.  The chain of arguement goes
something like this:

xpcom needs to reduces its dependencies for embedding.  xpcom requires
unicode->fs converters.  the converters that i18n built require necko which
brings in the world.  break this conversion by using system calls.

The above decission has been made.  it was discussed for week and week in person
and made several newsgroup postings:

news://news.mozilla.org:119/3CCF08E2.4020802@netscape.com
news://news.mozilla.org:119/3C190770.701@netscape.com


> Who will own the iconv code and be responsible for its maintenance?

I will if no one else can.

> Right now based on what I see the decision to use iconv feels to me like very 
sloppy work. I will gladly retract that if anyone can show me the prepatory work 
that seems reasonable when making architectural decisions.

sorry that it "feels" like "sloppy work".  Maybe you can offered a better
solution.  Right now, you are just pissing people off with your arrogant
assessment of what we already know.  Stop complaining and do something!
The fact is this bug exists and we have to do something about it.
Masaki's comment (#30) pretty much describes what we need to do 
for Solaris unless there is a better way.

I agree with Alec. Let's wait for an iconv patch and bake it on the trunk.  
I do not understand how can the patch for 129279 be checked in without review
from my team at all. I said many many time that you cannot use mbtowc or iconv
to replace our internal converter. The reason you cannot use mbtowc is the
wchar_t is not always unicode. The reason you cannot use iconv is the conversion
may not always be there. 

We should back out the change in 129279 to fix this problem. Using mbtowc or
iconv will not solve your problem and will only increase the complexity to let
you solve it later. 

Remove dependency on the uconv stuff is a "mission impossible". I said it many
time in the past. I truly hope people will listen. 

Please don't try to tell me how good is wbtomc or iconv. I know those stuff very
well for more than 10 years. I reviewed the source code of wbtomc in SVR3.2 in
1989 (yes, 13 years ago).  Bob and I were one of the reviewer for X/Open when
they introduce iconv into XPG/4.  (XPG/3 only have iconv in command line, XPG/4
have it in api) . Both of them WON'T do the job for you. 

If you use iconv, you will introduce new depedency into the mozilla source code. 
also, the iconv may or may not have the conversion pair for the locale. 

If you use wbtomc, the run time code is wrong on MANY locales and MANY
platforms. Because many locales and many platform do NOT use Unicode as the
internal representation for wchar_t. In fact, there are platform define wchar_t
as one byte, 2 bytes or 4 bytes.

more problem about iconv
there are no guarantee that the value return by nl_langinfo will be the same
name used in iconv
also, there are no clear definitation about what is the name should be used for
iconv to indicate UCS2
.Dear darin:
Please make decision based on what "you know" not what "you suspect".

>------- Additional Comment #25 From Darin Fisher 2002-05-29 00:18 -------

>ok, i did some investigation... and it looks like iconv is supported as far back
>as redhat 6.0, so we are fine using that for linux.  Mac OSX also provides
>iconv, so that pretty much implies that *BSD systems should be covered.  i
>suspect most unix platforms support iconv, so my patch will use iconv if
>available and fall back on the existing implementation otherwise.

darin, please understand there are difference between
1. the iconv apis is available, and
2. the iconv implementation include all the converter we need

From my understanding, on a lot of platform, while iconv api itself is
available, the conversion implementation (from one specifc encoding to another
one) that we need may not be there. 
Also, even the conversion implementation is there, we may not know how to
involve it. The name used to label the from charset and to charset may not be
the same across platform.

Since you mention MacOS X. Have you check how many converters are available on
the iconv supplied by MacOS X?


>so that pretty much implies that *BSD systems should be covered.

This is a FALSE assumption. Yur assumption of MacOS X have it so all BSD have it
is a FALSE assumption. Here is what I KNOW. (not what I suspect)

form my understanding, BSD does not implement iconv. iconv is introduced during
the X/Open movement around 1989-1992. Both Bob Jung and I were in the Unix
International / International Working Group and Open Software Fundation /
International Working Group that time. The iconv api is added to XPG/4 in 1992.
And in that time the main BSD development already stop. 

The first iconv implementation I saw was happen in SVR4 from USL in 1991. 

>i suspect most unix platforms support iconv
Again, this is what you "suspect" , not what you "know". And I KNOW there are a
lot of unix platform do not support iconv. or even they support iconv, it does
not support the conversion implementation we need. or they suppor the
implementation but we don't know how to invoke it.

dougt:

>The above decission has been made.  it was discussed for week and week in person
>and made several newsgroup postings:
I don't remember any one talk to me in person. Who have you talk to in person
have the knowledge about the availability of wbtomc or iconv on different
platforms ?
> We should back out the change in 129279 to fix this problem. Using mbtowc or
> iconv will not solve your problem and will only increase the complexity to let
> you solve it later. 

FWIW, backing out just the changes to 129279 *will* break the build.  The build
structure has been changed since that checkin to reflect and enforce the fact
that xpcom no longer depends upon uconv.  Backing out the whole ball of wax is
possible but definitely not desirable as I'm sure developers like not having to
pull the entire tree (or 1/2 the tree using mcafee's script) just to build xpcom. 

> Remove dependency on the uconv stuff is a "mission impossible". I said it many
> time in the past. I truly hope people will listen. 

Pardon my disbelief, but why is that?  At the very least, why isn't possible to
 build uconv without bringing in necko and the rest of the world?  I may be
getting off target but if all of the changes are backed out, then we are going
to need some "uconv-lite" that we can direct people to build with the
"standalone" xpcom module that doesn't pull in the rest of the tree.  I'm still
mystified as to why we think it's a good idea to have our base library depend
upon a higher level component to work properly.  Perhaps this uconv-lite
shouldn't be part of the uconv module at all and should be a standalone lib on
the same level as xpcom (no, I'm not going to suggest stuffing more stuff into
xpcom).

so dougt, what is you experience using iconv?
dougt: would you kindly drop the personal attacks
Frank, Roy, Brian, Alec, Darin, and I just had a meeting.  We agreed that the
direction that we would like to take is to use iconv where available falling
back on zero-byte padding.  Darin is going to checkin a patch on the trunk (once
reviewed and stuff) which converts the 'nix platforms over to use iconv.  We
will need the help of platform owners and users to test these changes on
different locales.  Frank Tang will be sending mail to the unix newsgroup
shortly do ask for this help.  

Chris, this does add a new library requirement to mozilla.  Is there anything
special we should do in configure?
I'm going to assume that we're _not_ checking-in another 3rd party library into
the tree and that this will be an external dependency.  We'll want to add a
configure stopping test for the library in question.  Configure's libical test
for MOZ_CALENDAR should be a sufficient template.  And, of course, the
tinderboxes & build release machines will need to be updated before the patch
can land.

seawood: the systems we care about "should" already have support for iconv.  in
fact, under linux iconv is simply part of glibc (or so it seems now that i'm
happily invoking iconv under linux).  that said, my understanding is that on
some systems iconv is provided via libiconv.so, but i'm not sure where that applies.
darin, if you made the patch, we (Sun china browser team) can help you to test 
before it can land.
So, to back up what Frank said, a quick cursory check shows that the sysv
systems have the iconv routines and the bsd-based systems do not.  I have no
idea how complete those implementations are.  Wrt Macs, OSX does not come with
iconv.h.  On my OSX 10.1.4 box, iconv.h & libiconv.so are installed under /sw
which means that they came from fink.  The additional runtime dependency needs
to be run by drivers as well.  (Not that anyone cares but BeOS appears to also
be missing an iconv implementation though a port is available.  Not sure about
our other fringe platforms: OpenVMS, QNX, etc.)


seawood: i discussed the potential for a iconv dependency with drivers today, so
they're aware of the situation.
this patch isn't final by any means... it just switches nsLocalFileUnix over to
using iconv.  please let me know how this patch works out... thx!!
The prototype of iconv in Solaris is:
iconv(iconv_t, const char **, size_t *, char **, size_t *);

So we should change
+        res = iconv(gNativeToUnicode, (char **) input, &inLeft,
+                                      (char **) output, &outLeft);
and
+        res = iconv(gUnicodeToNative, (char **) input, &inLeft,
+                                      (char **) output, &outLeft);
to (const char **) input

Tried xpcom's clobber build, it still can not convert chinese char correctly.
Will try full clobber build later.
yes,i still could not build successfully,so we must change some code like that
#if defined(SOLARIS) 
res = iconv(gNativeToUnicode, (const char **) input, &inLeft,
                              (char **) output, &outLeft);
#endif
 
full clobber build completed, but it still failed to convert chinese char.
I am adding myself.  This sounds similar to the failure
we are seeing on AIX...  So am going to keep an eye on this.
FYI  AIX rc3 is dead in the water.
We are seeing similar regressions on AIX from RC2->RC3. Specific symptoms include:

1) Starting Mozilla with a clean profile works on first run, however on
subsequent runs the Profile Manager comes up, saying the migrated profile can
not be run.
2) No files are visible in the File->Open File dialog. (Displays the message
"You do not have the permissions necessary to view this directory").
3) Creating a new profile in the Profile Manager is impossible - the folder path
is made up of garbage characters.

I will test this iconv patch on AIX and let you know if it addresses these problems.
I don't think this is the best fix for this, but for the branch
I think it is fine.  For the trunk we probably want to test for
iconv_open and if we find it have a USE_ICONV define and if
USE_ICONV is set then we should link agsinst EXTRA_ICONV_LIBS
in xpcom/build/Makefile.in.  EXTRA_ICONV_LIBS should also be
set in configure

NOTE: AIX also needs the previously mentioned (char **) -> (const char **)
patch
Keywords: mozilla1.0mozilla1.0.1
Summary: regression: can not browse local file on file picker → Cannot load local files whose names contain Japanese/Chinese characters
fwiw I asked a QNX user and it doesn't appear to have iconv either :)
so, after making the (const char**) fix and adding -liconv (if necessary) what
are the results?  comment #56 says that there are problems with chinese
characters... what about others?  thx!
These two patches seem to fix the problems on AIX. I am able to start Mozilla
multiple times, create profiles, and use the filepicker dialog.
pkw: that's great news... but which locales did you test?  thx!
Just en_US. We were seeing general problems that were not specific to
Japanese/Chinese locales.
For the record, OpenVMS does appear to have iconv_open and iconv_close (though I
have never tried to use them). They are part of the standard C library and so I
don't need a -liconv or anything like that at link time.
Attached patch add configure test for iconv (obsolete) — Splinter Review
This patch adds a configure test for iconv. Since I didn't want to duplicate
the iconv testing code, I look for iconv in libiconv first, then run the link
test for iconv, iconv_open & iconv_close.  If the test passes, _ICONV_LIBS
(either -liconv or empty) is appended to XPCOM_LIBS. Because some platforms
require symbols to be resolved at link time,-liconv must be used every place
-lxpcom is used. If we decide to use iconv directly outside of xpcom, we may
want to use a separate variable.

I also fixed a tiny bug with the USE_STDCONV case where mbrtowc & wcrtomb are
not available.

Because the test to use iconv or stdconv is in the source file, we may wind up
with an unneeded dependency by linking in libiconv but using the stdconv
routines if nl_langinfo isn't set.  I saw this on my OSX build.
Attachment #85557 - Attachment is obsolete: true
Attachment #85604 - Attachment is obsolete: true
 Darin,
As for your patch can't work correctly on Solaris, I think it's for there is no
support for directly converting between locale codeset and UCS-2 codeset, so the 
iconv can't work well when trying to directly convert codeset between locale and
UCS-2. Maybe should make two step convert. First from locale to UTF8 and then
UTF8 to UCS-2.

from Ervin:
On Solaris iconv provide conversion between UTF-8 and the value return by
nl_langinfo CHARSET.   and provide the conversion between UTF-8 and UCS-2.   
seems no direct conversion between the value return by nl_langinfo CHARSET and
the UCS-2(little-endian/big-endian).




Antonio  has verified my guess. He has write a segment of code as sample.
It work well in Solaris. Antonio will write mail to Darin for it.

Thank Antonio !
Thanks for john's advice,i had changed some code of
nsNativeCharsetUtils.cpp,then make mozilla's filepicker show chinese word
correctly,i had changed some code for solaris like that
 char ** tmp;
        char *tmpoutput=new char[inLeft];
        *tmpoutput ='\0';
        char *tmpinput;
        tmpinput = tmpoutput; 
        size_t tmpoutLeft = inLeft;
        
       
        res = iconv(gUnicodeToUtf8, (const char **) input, &inLeft,
                                      (char **) &tmpoutput, &tmpoutLeft);
        tmpoutLeft = (PRUint32)(*inputLeft*2 - (PRUint32)tmpoutLeft); 
        res = iconv(gUtf8ToNative, (const char **) &tmpinput, &tmpoutLeft,
                                      (char **) output, &outLeft);     
it is means convert local chrset to UTF-8 then convert UTF-8 to UCS-2,because if
soalris can not convert from GBK to UCs-2.
i hope it can help you,if you have problem ,please let me know.
belong browser-china@sun.com
Hmm. Maybe we should try directly, then try via UTF8. Are there any other
'common' intermediate formats which are guaranteed not to lose bits we want?
iconv --list on a glibc system lists the valid locales (Although again, it
doesn't promise that you can use them all)

It does, however, contain both 'UCS2' and 'UCS-2'. Antonio, does using UCS2
instead of UCS-2 in darin's patch work?
As timeless mentioned in Comment #60, QNX doesn't have iconv.

I just tried with opening a local file with filename in Chinese
using the File->Open File ...
1) RC2, doesn't work. It says "File xxxx doesn't exist".
2) RC3, works! It opens and display the file fine.

So, whatever changed in RC3, though broke Solaris, fixed it for QNX :)

Frank
only if we can convince all the solaris users to switch to qnx....

For QNX, maybe it is best to continue using mbtowc.  is the wchar unicode on QNX?
poking around on worms, i discovered something interesting about solaris.  the
available conversion libraries can be found in /usr/lib/iconv/*.so ...
apparently it dynamically loads the appropriate library named like this:
"FROM%TO.so" ... (e.g.: eucJP%UTF-8.so)... there is no eucJP%UCS*.so, at least
not on worms.
solaris supports a much smaller set of conversions from UCS-2 directly.  i'm
thinking that we should always try to convert directly from/to UCS-2, but
fallback on UTF-8 if required.
on mothra:

#uname -a
SunOS mothra.mozilla.org 5.7 Generic_106541-12 sun4u sparc

#ls UTF*JP*
UTF-8%ISO-2022-JP.RFC1468.so  UTF-8%eucJP.so
UTF-8%ISO-2022-JP.so          UTF-8-Java%eucJP.so
#ls *JP*UTF*
ISO-2022-JP%UTF-8.so  eucJP%UTF-8-Java.so   eucJP%UTF-8.so

this is the contents of /usr/lib/iconv on mothra, which runs solaris 5.7
Ref my comment #72
Sorry my test was messed up. I was testing a file on a foreign
filesystem (vfat).
I just test it again with the QNX's native filesystem, and we
have the same problem as others: rc2 works fine (with chinese
filename), but rc3 gives "file doesn't exist".
My guess is QNX's vfat filesystem support probably has a bug which masked
the mozilla problem.
Frank
Frank: does QNX provide the iconv API?
RC3 for BeOS PC loads local files which contain russian utf-8 charcaters properly,
though it shows such name in URL bar in "escaped" form:
BeBookindex%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9.html
I'm not familiar with this bug 147333 yet, so don't know, is it proper behaviour
or not.
Only thing i have to say here, is that in BeOS UTF-8 is native. So, theoretically,  
all correct file names are(should be) UTF-8.
darin, 
as timeless mentioned in comment #60, qnx doesn't provide iconv API.
However, we can port GNU libiconv from http://www.gnu.org/software/libiconv/, if
that is needed.
Frank
Frank: all we need is a way to convert UCS-2 to |char*| (and vice versa)
suitable for calling fopen and other stdc functions.  does QNX provide any
function(s) to help with this?  if not, then yes.. you probably want to look
into porting libiconv.  thx!
Attached patch v1 patch (obsolete) — Splinter Review
ok, this patch includes UTF-8 fallback support as well as seawood's autoconf
tests (plus an additional autoconf test for iconv with const input).

please test this out with as many different locales and on as many different
platforms as you can.  thx!!
Attachment #85725 - Attachment is obsolete: true
Here's the alternative iconv check, AM_ICONV, that bbaetz pointed out. 
Unfortunately, it pulls in a couple of other .m4 files as well as a rpath
configure test.
cls: let me know which configure.in code you think we should use.
Patch 85922 does not work for AIX. The ISO charset names in AIX are ISO8859-n (not  
ISO-8859-n). This is true for iconv and nl_langinfo. Therefore, the test in
configure.in doesn't work and HAVE_ICONV and HAVE_ICONV_WITH_CONST_INPUT are not
defined. I changed nsNativeCharsetUtils.cpp accordingly but it still doesn't
work. It might be because I only did a partial rebuild. So I will do a new build
from scratch. 
patch v1 doesn't work for Solaris 5.9 chinese env. (chinese char was still 
displayed as ????)
kyle: what does your configure output say about iconv?
I didn't find any contents about iconv in configure output. How to get them?
so, my patch changes configure.in ... which requires that you run autoconf to
rebuild the configure shell script.  then when you run "gmake -f client.mk
build" you should see the usual "configure output" ... there should be a line
that says something like "Checking for iconv ..."

please verify that this is the case for you... thx!!
Confirming my previous comment:
patch 85922 (or v1 patch) works for AIX if I change ISO-8859-1 to ISO8859-1 in
configure.in (3x). 
In addition, I need patch85604 which defines -liconv in
mozilla/xpcom/build/Makefile.in, or XPCOM_LIBS, as defined in the patched
configure.in (to include -liconv), must be used in mozilla/xpcom/build/Makefile.in.

I have tested this fix running mozilla in the following locales: ja_JP (eucJP),
Ja_JP (shift-jis), JA_JP (UTF-8), zh_CN (eucCN). For each of these locales,
filenames with locale specific characters were displayed correctly in the Open
File list.

OpenVMS also needs ISO8859-1 instead of ISO-8859-1.

And the test program in configure.in is brain dead. I know it doesn't matter
since it never actually runs, but I was trying to use it as a test to see if
iconv actually worked on OpenVMS. There are three problems with the test:
  - inLeft should be the size of the input buffer, not the address
  - outLeft should be the size of the output buffer, not the address
  - the 4th arg to iconv should be a **char (just casting a *char doesn't cut it)

I'm still waiting to hear back from my contact in Japan as to whether or not we
need this fix in OpenVMS.
yeah, forget the configure.in tests in the v1 patch.  the v1 patch has other
problems too.  anyhow, i'm working on a much improved version of the patch now,
and should have something uploaded shortly.  i'd like to stay away from the huge
configure.in changes corresponding to the alt.iconv() check submitted by cls,
but i have yet to get in touch with cls to discuss this further.
Blocks: 141008
Attached patch v2 patch (obsolete) — Splinter Review
this patch includes a hopefully much improved autoconf test and some extra
fallback logic in nsNativeCharsetUtils.cpp to handle the different ways in
which encodings are written (e.g., UCS-2 vs UCS2).
Attachment #85922 - Attachment is obsolete: true
i think this latest patch is close to be the final patch.  what i plan to do is
land this on the trunk and use feedback from tinderbox to further customize
configure.in to ensure that iconv is used everywhere it can be used.  then it
should be pretty straightforward for folks to pull and build the trunk bits to
verify that the patch has done the trick.  after that we can think about landing
this on the branch.
Comment on attachment 86158 [details] [diff] [review]
v2 patch

seawood needs to review the configure changes.	Especially:

-MOZ_COMPONENT_XPCOM_LIBS='-L$(DIST)/bin -lxpcom'
+MOZ_COMPONENT_XPCOM_LIBS='$(XPCOM_LIBS)'
dougt: seawood actually authored the first version of these configure.in
changes... the lines you site in particular come from him.  still.. i have
changed his original version quite a bit, so i'll be sure to ask him to review
the version in this patch.
*** Bug 148953 has been marked as a duplicate of this bug. ***
*** Bug 149018 has been marked as a duplicate of this bug. ***
we've finally reproduced this problem on IRIX.  
It's broken in 1.0rc3 on IRIX.

:)
I have tested the latest v2 patch on AIX in en_US and ja_JP and it works
properly. The only remaining problem we are seeing is that we are not linking
with -liconv in mozilla/xpcom/base so we are still having to use attachment
85604 [details] [diff] [review]. Perhaps we should be adding -liconv to another define besides XPCOM_LIBS?
philip: thx again for the feedback... i'll look into getting the -liconv problem
solved.
Blocks: 1.1a
Frank and I looked at attachment 86158 [details] [diff] [review].

Frank asked if the code that called iconv was tread safe or used in a thread
safe manner.

The design here is that if iconv is not available then we fallback
to mbtowc/mbrtowc. To help tell if this at least marginally functional
Frank asked if the configure.in test could check that at least the
mbtowc/wctomb be able to convert ascii values; eg: 'a' (0x41) is converted
to u0041 and not say uFF41.

+#if defined(HAVE_ICONV_WITH_CONST_INPUT)
+    res = iconv(converter, input, inputLeft, output, outputLeft);
+#else
+    res = iconv(converter, (char **) input, inputLeft, output, outputLeft);
+#endif

nit: would it be attractive for the cast to become a macro?

   res = iconv(converter, ICONV_INPUT_CAST input, inputLeft, output, outputLeft);

+            res = xp_iconv(gUTF8ToUnicode, (const char **) &p, &n, (char **)
output, &outLeft);

Frank ask if you could use NS_ConvertUTF8toUCS2() / NS_ConvertUCS2toUTF8() 
instead of iconv here?

+ * XXX XXX XXX not for public consumption XXX XXX XXX

Could this be something like:

+                              **** NOTICE ****
+  
+             *** THESE ARE NOT GENERAL PURPOSE CONVERTERS ***
+ 
+  NS_CopyNativeToUnicode /  NS_CopyUnicodeToNative should only be used
+  by XP_COM for converting *FILENAMES* between native and Unicode. They
+  are not designed or tested for general encoding converter use.

Could you also put a notice on the lines where these calls are used to
direct readers to this message? Perhaps

// NS_CopyNativeToUnicode is for XP_COM filename use only; see
nsNativeCharsetUtils.h

+nsresult
+nsNativeCharsetConverter::UnicodeToNative(const PRUnichar **input,
...
+    if (gUnicodeToNative != INVALID_ICONV_T) {
...
+#if defined(ENABLE_UTF8_FALLBACK_SUPPORT)
+    else if ((gUnicodeToUTF8 != INVALID_ICONV_T) &&
+             (gUTF8ToNative != INVALID_ICONV_T)) {
...
+#endif
+
+    // fallback: truncate and hope for the best

Is there someway to add an assert if we are doing the fallback?
> Frank asked if the code that called iconv was tread safe or used in a thread
> safe manner.

yes, access to the iconv converters is protected by a global lock.  see the
constructor/destructor for nsNativeCharsetConverter.


> Frank asked if the configure.in test could check that at least the
> mbtowc/wctomb be able to convert ascii values; eg: 'a' (0x41) is converted
> to u0041 and not say uFF41.

sure, that sounds like a good plan.


> nit: would it be attractive for the cast to become a macro?

yeah, i like that.


> Frank ask if you could use NS_ConvertUTF8toUCS2() / NS_ConvertUCS2toUTF8() 
> instead of iconv here?

i certainly could, but it would be less performant... or at least, there would
be one extra buffer copy.  on the other hand, there are some definite benefits
to using our UTF8 - UCS2 converters... the resulting code would be simpler, and
we would be using more of our code.


> Could this be something like:
> ...
> Could you also put a notice on the lines where these calls are used to
> direct readers to this message? Perhaps

sure

> Is there someway to add an assert if we are doing the fallback?

i'll output a debug warning... perhaps in GlobalInit.
The patch appears to work on OpenVMS, but I can't really test it. I asked an HP
engineer in Japan to try andreproduce the problem, but he tells me that Japanese
file names are not really well supported, even in Japanese VMS, and that because
of this, any file names containing Japanese characters don't appear in file
lists. This is true of all/most applications running on Japanese VMS, not just
Mozilla. He says that this problem most certainly does NOT need fixing in 1.0
final, and so for this reason, I am NOT including the patch in the OpenVMS 1.0
final kit.
Attached patch v3 patch (obsolete) — Splinter Review
this patch incorporates suggestions from ftang and bstell, plus i also took a
stab at fixing the -liconv problem.  hoping this will be the final revision.
Attachment #86158 - Attachment is obsolete: true
Comment on attachment 86528 [details] [diff] [review]
v3 patch

A minor tweak needed before landing:

 I am one of those early users of the gmake builds on windows.	The
xpcom/io/Makefile.in adds the nsNativeCharsetUtils.cpp	file to the general set
of cpp files.  Instead, nsNativeCharsetUtils.cpp  should only be added here:


ifeq ($(MOZ_WIDGET_TOOLKIT),windows)
CPPSRCS 	+= nsLocalFileWin.cpp
else
CPPSRCS 	+= nsLocalFileUnix.cpp nsNativeCharsetUtils.cpp 
endif # windows
#endif
endif # OS2

I have only tested this on the window gmake build.  Make sense?
dougt: good point... actually, i'll probably just add empty implementations of
nsNativeCharsetUtils.cpp instead because eventually i would like these functions
to be available XP.
In trying to test the v3 patch on Sparc Solaris 7, I think I may have found a
problem with the configure tests. I get the following error during configure:
configure:8011: c++ -o conftest  -pthreads   -I/usr/openwin/include  conftest.C
-liconv  -lsocket -ldl -lm  1>&5
ld: fatal: library -liconv: not found
ld: fatal: File processing errors. No output written to conftest
collect2: ld returned 1 exit status

The message makes no sense, since I have libiconv in /usr/lib, which is in  
LD_LIBRARY_PATH. I have gotten the test program to build outside of configure.
Does configure change the library path?

Anyway, I forced the iconv tests to "yes" in configure and I am trying to build now.
I noticed a couple of items on reading this:

+        res = xp_iconv(gNativeToUnicode, input, &inLeft, (char **) output, ...
+        if (res != (size_t) -1) {
         ...
+            return NS_OK;
+        }
         ...
+        // reset converter
+        xp_iconv(gNativeToUnicode, NULL, NULL, NULL, NULL);

Can we always reset the converter after using it (even if t was successful). 
That way it could not carry state from conversion to conversion.

+    res = iconv(converter, ICONV_INPUT(input), inputLeft, output, outputLeft);
+    if (res == (size_t) -1) {
+        // on some platforms (e.g., linux) iconv will fail with
+        // E2BIG if it cannot convert _all_ of its input.  we're
+        // happy if it just converts some of its input.
+        if ((errno == E2BIG) && (*outputLeft < outputAvail))
+            res = 0;
+    }

Could you expound on the overall logic of returning a success when the conversion
failed? I do not understand the context where this would be attractive.
Attachment #86528 - Flags: review+
Comment on attachment 86528 [details] [diff] [review]
v3 patch

Darin: would you add to this bug the explaination you gave me over the phone
and pls always reset the converters.

subject to those: r=bstell@ix.netcom.com
Comment on attachment 86528 [details] [diff] [review]
v3 patch

I think I talked darin into taking a bit of time to

(1) invert loop/inner-if order to avoid mispredicted branches in conversion
loops;

(2) do A-B tests on Linux to see which solution performs best.

I'm concerned about the global converter shared with PRLock here -- there's no
space reason to avoid a converter per thread, and we could use NSPR's TLS to
avoid the locking overhead (although TLS isn't free to access, once allocated).
 Also, we've been using wctombs et al. on Linux for several milestones; however
good iconv may be, changing now (for 1.0.1, not for 1.1alpha) scares me.  I
favor a conservative approach, even though the existing wctombs-calling code is
being disturbed.

dbaron mentioned using a sink for best string-fu, rather than the nested loop
pattern.

I think drivers should let darin take a day here and do some measurements, and
produce a patch that we can all be happy with for trunk and branch.

/be
got errors when compiled on Solaris 9/Forte 6U2

ld: warning: file 1QpGKnD8s9L5Yympgh7i.o: attempted multiple inclusion of file
ld: warning: file t6U4GxapKqSiiBmX2i3U.o: attempted multiple inclusion of file
ld: warning: file E2n7-uidwqr1S0TOALG4.o: attempted multiple inclusion of file
ld: warning: file 1QpGKnD8s9L5Yympgh7i.o: attempted multiple inclusion of file
ld: warning: file t6U4GxapKqSiiBmX2i3U.o: attempted multiple inclusion of file
ld: warning: file E2n7-uidwqr1S0TOALG4.o: attempted multiple inclusion of file
ld: warning: file 9DdvNvaiuZQbBAq2FhO6.o: attempted multiple inclusion of file
ld: warning: file 7QUf-O3hji--ubaU86Sa.o: attempted multiple inclusion of file
ld: warning: file 4Ulp-2UqQkvl-mPfhR9-.o: attempted multiple inclusion of file
ld: warning: file 7QUf-O3hji--ubaU86Sa.o: attempted multiple inclusion of file
ld: warning: file 4Ulp-2UqQkvl-mPfhR9-.o: attempted multiple inclusion of file
ld: warning: file 9DdvNvaiuZQbBAq2FhO6.o: attempted multiple inclusion of file
ld: warning: file 1QpGKnD8s9L5Yympgh7i.o: attempted multiple inclusion of file
ld: warning: file t6U4GxapKqSiiBmX2i3U.o: attempted multiple inclusion of file
ld: warning: file E2n7-uidwqr1S0TOALG4.o: attempted multiple inclusion of file
ld: fatal: file @LIBICONV@: open failed: No such file or directory
ld: fatal: File processing errors. No output written to libxpcom.so
gmake[3]: *** [libxpcom.so] Error 1
gmake[3]: Leaving directory `/export/home/work/mozilla_4/mozilla/xpcom/build'
gmake[2]: *** [libs] Error 2
gmake[2]: Leaving directory `/export/home/work/mozilla_4/mozilla/xpcom'
gmake[1]: *** [tier_2] Error 2
gmake[1]: Leaving directory `/export/home/work/mozilla_4/mozilla'
gmake: *** [default] Error 2

I used current trunk source and applied v3 patch.
Kyle, once you apply the patch, you must regenerate configure using autoconf;
the process is relatively simple if you have autoconf locally:
from mozilla/
autoconf -l build/autoconf

(this will make configure from configure.in).

Autoconf 2.12 (you can't use 2.5x, sorry) can be found at
http://www.gnu.org/software/autoconf/ 
but it also requires GNU M4, 
http://www.seindal.dk/rene/gnu/

Normally this is automatically updated (on the trunk at least) from configure.in.
*or* darin could attach his generated configure script :)
yes, it is beeter that the new configure script is attatched.
thanks leaf. now I can compile xpcom successfully. but the whole building 
progress is not finished yet. I'll post the testing result ASAP.
tested v3 patch, it works for me (Solaris 9, CDE, Simplified Chinese)
ported gnu libiconv and tested v3 patch on the qnx6.2, it doesn't work.
same error as without the patch. "ldd mozilla-bin" does show  /usr/lib/libiconv.so.2

and here is part of the ./configure output:
...
checking for gnu_get_libc_version()... no
checking for iconv in -liconv... no
checking for libiconv in -liconv... yes
checking for iconv()... yes
checking for iconv() with const input... yes
checking whether va_list assignments need array notation... no
...

Frank
so, i ran the startup performance tests, comparing the old wcrtomb method to
iconv, and it looks like iconv is slightly slower... about 0.3% to be exact. 
this is probably due to the overhead of the PR_Lock as brendan suggested.  so,
that pretty much tells us that we should prefer wcrtomb/mbrtowc on linux since
we know that wchar_t is unicode under linux.

Frank: as for the problems with QNX, did you observe any debug warning messages
on the console having to do with conversion failures?  what exactly didn't work?
 thx!
"didn't work" means when I "File | Open File" to open a local file with filename
in Chinese, it pops up a window with an error "File xxxx doesn't exist".
There is no error on the console. Actually, I don';t see any messages on the
console since the RC releases. Is there an option that I need to use to turn off
those debug messages? -Frank
Frank: sounds like you are building an optimized build.  if you build debug,
you'll see some status/warning messages if something is going wrong.  also, were
the chinese file names showing up correctly in the file picker??  is the only
problem that you cannot open a file with a chinese name?
Attached patch v4 patchSplinter Review
revised per last comments from bstell and brendan.
Attachment #86528 - Attachment is obsolete: true
Comment on attachment 86906 [details] [diff] [review]
v4 patch

forwarding r=bstell, sr=brendan based on previous comments.
Attachment #86906 - Flags: superreview+
Attachment #86906 - Flags: review+
Yes, I am building optimized. I will make a debug build with 
your new v4 patch.
Yes, the filenames show up correctly in the file picker, but
I cannot open it. It pops up an error windows saying "File xxxx
doesn't exist".
BTW, even without your patch, filename shows up correctly but
I can't open it. It seems your patch doesn't make any difference.
Again, in RC2, filename shows up correctly and I can open it fine.
Frank: it sounds like the conversion from unicode to the native charset must be
failing on QNX where as the conversion from native charset to unicode seems to
be OK.  hopefully your debug build will indicate the problem.
those messages may mean something?
WARNING: mbtowc failed: possible charset mismatch, file
nsNativeCharsetUtils.cpp, line 579
full console.log attached.
yup, that warning indicates that iconv is not being used... can you upload your
config-defs.h to this bug report... i suspect HAVE_NL_TYPES_H or
HAVE_NL_LANGINFO are not being defined (in which case we are unable to infer the
native charset of your system).
Attached file config-defs.h
I don't see HAVE_NL_TYPES_H or HAVE_NL_LANGINFO in the config-defs.h
See attachment.
Frank: hmm.. we need nl_langinfo to determine the native charset of the system.
 without that we can't call iconv_open.  does QNX provide some other way to
determine the native filesystem charset?
Comment on attachment 86906 [details] [diff] [review]
v4 patch

a=asa (on behalf of drivers) for checkin to the 1.1 trunk.
Attachment #86906 - Flags: approval+
I asked around but there isn't such function. People usually just hard code
UTF8 or ISO-88??-1.
I understand this is probably not a good solution. but between something that
"may work" and "not work at all", what will you choose?
if mbtowc or wctomb fail, then we assume the multibyte encoding is ISO-8859-1. 
that should be the case corresponding to the console log uploaded by Frank.  so,
assuming the native charset is ISO-8859-1 and then using iconv wouldn't get us
anywhere.  perhaps UTF-8 is correct for QNX but i don't know.  someone would
have to hack the patch and test it out.
fixed-on-trunk

marking FIXED to get QA's attention.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Is there a quick way (say, through some environment variable) to change
mozilla's default assumption from "ISO-8859-1" to "UTF-8" to test it out on QNX?
I tried "View | Character Coding | .. " to change it, but it doesn't seem
to affect the File Picker ...
no there isn't any thing like that... you'd have to hack the source yourself.  i
can give you pointers if you want to go that route.
Katakai-san: Could you please verified it on Sun solaris machine? cause I only
have RH7.2 here which doesn't has this problem.
http://bugzilla.mozilla.org/show_bug.cgi?id=147333#c113
> Darin: ... and pls always reset the converters.

http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsNativeCharsetUtils.cpp#345
345 nsNativeCharsetConverter::NativeToUnicode(const char **input,
346                                           PRUint32    *inputLeft,
347                                           PRUnichar  **output,
348                                           PRUint32    *outputLeft)
349 {
350     size_t res = 0;
351     size_t inLeft = (size_t) *inputLeft;
352     size_t outLeft = (size_t) *outputLeft * 2;
353 
354     if (gNativeToUnicode != INVALID_ICONV_T) {
355 
356         res = xp_iconv(gNativeToUnicode, input, &inLeft, (char **) output, 
&outLeft);
357 
358         if (res != (size_t) -1) {
359             *inputLeft = inLeft;
360             *outputLeft = outLeft / 2;
361             return NS_OK;
362         }
363 
364         NS_WARNING("conversion from native to ucs-2 failed");
365 
366         // reset converter
367         xp_iconv(gNativeToUnicode, NULL, NULL, NULL, NULL);

Would it be possible to always reset the converters?

answer to #139
Edit->Preference->Navigator->Language

In bottom right of the dialog box, you can change the default charset.
bstell: if you look at the destructor for nsNativeCharsetConverter you will see
that we always reset the converter.  in fact, i probably needn't bother reseting
it in the place you quoted anymore.
I have done verification on Solaris 8 ja locales on SPARC/Inetl.
It's working fine. I'll try more testing other OS version and
locales tomorrow.

However, it seems that download manager always dumps core
here when I try to save contents to japanese filename. When
I disable download manager, the problem does not happen.
(I found similar bug but filed as new one - bug 150832)
> if you look at the destructor for nsNativeCharsetConverter you will see
> that we always reset the converter.

This would imply that NativeToUnicode and UnicodeToNative can only be called 
once between construction and destruction.

If so, can we put in an assert to detect multiple uses and warn developers?

brian: 

they are designed to be called multiple times between construction and
destruction.  the input is potentially multi-fragment, which means that it might
be important to save the state of the converter when moving to the next
fragment.  with the current patch, this is permitted.  if i were to reset the
converter after working on each fragment then i'd potentially lose such state. 
so, that can't be the right thing to do.  moreover, even if the input is
single-fragment, we might still call these routines (NativeToUnicode,
UnicodeToNative) more than once if the input size is greater than the maximum
output buffer size.
The solaris tinderboxes on the Ports pages have been orange for the past day. 
It looks like someone reset the tinderbox yesterday so I'm not absolutely sure
that it's this checkin but it definitely needs to be looked into as the first
orange cycled occurred during thattimeframe.
seawood: yeah, i noticed that too... unfortunately, the error message isn't all
that helpful.  i should probably login and try running the build manually to see
what's going on.
okay, this sound correct. Thanks Darin.
Jim Crumley,

Have you succeeded build on Solaris 7?

I understand there should not be any libiconv under /usr/lib/
on Solaris system. What is your libiconv.so?
Can you check which pkg does contain our libiconv.so?

% grep libiconv.so /var/sadm/install/contents 
/usr/lib/libiconv.so f none 0755 bin bin 23348 20350 941709164 SUNWhuccd
% pkginfo -x SUNWhuccd
SUNWhuccd       Traditional Chinese User Based Chinese Console Display Package
                (sparc) 8.0,REV=1999.11.04.10.14

However, one exception exists on Solaris 8. We have
shipped libiconv.so as internal use in Solaris 8 for
chinese locales. This is not valid libiconv.so. When
we use Solaris 8 as build system, we have to remove
SUNWhuccd package.

# pkgrm SUNWhuccd

There is no regression if we have the libiconv.so but
the binary built with the libiconv will not run on
Solaris 9 because there is no libiconv.so in Solaris 9.
Yes, I have gotten the patch to build and it seems to be working.

My libiconv on Solaris 7 is from the package that you mention - SUNWhuccd      
Traditional Chinese User Based Chinese Console Display Package (sparc)
7.0,REV=1.0.18
Verified the fix on Solaris 8 and 9.

However, I'm seeing Solaris iconv bug. To reset the
iconv state iconv(conv, NULL, NULL, NULL, NULL) causes
core dump only when PCK(SJIS) -> UTF-8 converter.
I have reported this problem to our OS engineers.
Status: RESOLVED → VERIFIED
Reopening... all Solaris tinderbox machines are in ORANGE status; regxpcom
crashes... and my own box (Solaris 2.7/SPARC) cannot run Mozilla either... ;-(

Steps to reproducte:
1. % rm component.reg
2. % ./mozilla -g -d dbx
3. (dbx) run
dbx/mozilla output looks like this:
-- snip --
(dbx) run
Running: mozilla-bin 
(process id 28880)
Reading libdemangle.so.1
File descriptors set to 512
Reading en_US.so.2
Type Manifest File:
/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/dist/bin/components/xpti.dat
nsNativeComponentLoader: autoregistering begins.
Reading ISO8859-1%UTF-8.so
Reading UTF-8%UCS-2.so
Reading UCS-2%UTF-8.so
Reading UTF-8%ISO8859-1.so
t@1 (l@1) signal SEGV (no mapping at the fault address) in __icv_iconv at
0xfdb80ad0
0xfdb80ad0: __icv_iconv+0x0074: ld      [%i3], %o0
dbx: internal warning: Typeid already exists with different name:__1nJnsAString_
or symclass:struct, ignore current
stab:./libxpcom.so:../../../../../../../home/mozilla/src/2002-06-12-08-trunk/mozilla/xpcom/io/nsNativeCharsetUtils.cpp
stab #137 __1nJnsAString_:U(0,115)
Current function is xp_iconv
  115       res = iconv(converter, ICONV_INPUT(input), inputLeft, output,
outputLeft);
(dbx) where
current thread: t@1
  [1] __icv_iconv(0x7b6a8, 0x0, 0x0, 0x0, 0x1, 0xfdb80bd0), at 0xfdb80ad0
  [2] _iconv(0x93ea0, 0x0, 0x0, 0x0, 0x0, 0x353), at 0xfe4c48ec
=>[3] xp_iconv(converter = 0x93ea0, input = (nil), inputLeft = (nil), output =
(nil), outputLeft = (nil)), line 115 in "nsNativeCharsetUtils.cpp"
  [4] nsNativeCharsetConverter::~nsNativeCharsetConverter(this = 0xffbeec8b),
line 333 in "nsNativeCharsetUtils.cpp"
  [5] NS_CopyNativeToUnicode(input = CLASS, output = CLASS), line 686 in
"nsNativeCharsetUtils.cpp"
dbx: warning: can't find file
"/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/xpcom/build/nsLocalFileUnix.o"
dbx: warning: see `help finding-files'
  [6] nsLocalFile::GetLeafName(0xc6d78, 0xffbeee78, 0xf523c, 0xff0d6ce8,
0x80000000, 0xc6d78), at 0xff0d6d30
dbx: warning: can't find file
"/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/xpcom/build/nsNativeComponentLoader.o"
  [7] nsNativeComponentLoader::AutoRegisterComponent(0x9f330, 0xffbeef10, 0x1,
0xffbeef20, 0xff0ee280, 0xff1e3484), at 0xff0ecc90
  [8] nsNativeComponentLoader::RegisterComponentsInDir(0x9f330, 0x0, 0xc4f18,
0x0, 0xffbef028, 0xffbef030), at 0xff0eb9f8
  [9] nsNativeComponentLoader::AutoRegisterComponents(0x9f330, 0x0, 0x121468,
0x2800, 0xff1ad830, 0xfe53a05c), at 0xff0eb834
dbx: warning: can't find file
"/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/xpcom/build/nsComponentManager.o"
  [10] nsComponentManagerImpl::AutoRegisterImpl(0x9a1e0, 0x0, 0xffbef158,
0xff1f0f80, 0xff0eb7dc, 0x74cb4), at 0xff0e4438
  [11] nsComponentManagerImpl::AutoRegister(0x9a1e0, 0x0, 0x0, 0xff1e627c,
0xff1e3484, 0x2800), at 0xff0e4218
dbx: warning: can't find file
"/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/xpcom/build/nsComponentManagerObsolete.o"
  [12] nsComponentManager::AutoRegister(0x0, 0x0, 0x80000000, 0x80000000,
0xffbef22c, 0x0), at 0xff0eaa18
  [13] main1(argc = ???, argv = ???, nativeApp = ???) (optimized), at 0x195bc
(line ~1304) in "nsAppRunner.cpp"
  [14] main(argc = ???, argv = ???) (optimized), at 0x1a780 (line ~1805) in
"nsAppRunner.cpp"
(dbx) print converter, input, inputLeft, output, outputLeft
converter = 0x93ea0
input = (nil)
inputLeft = (nil)
output = (nil)
outputLeft = (nil)
-- snip --

I guess this is the problem described in comment #153 ... ;-(
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Taking myself...
Assignee: darin → Roland.Mainz
Status: REOPENED → NEW
Status: NEW → ASSIGNED
Attachment #87486 - Attachment is obsolete: true
Reuqesting r=/sr= ...
I understand iconv(conv, NULL,*,*,*) needs to be
called to reset the conv state. I'm thinking the
Solaris bug happens when iconv(conv, NULL,NULL,NULL,NULL)
is called, which means when 3rd and 4th arguments are not NULL,
this bug does not happen. I'll check.

Sorry, 4th and 5th arguments should not be NULL.

The following codes can work in my environment even
when input and inputLeft = NULL.

    char        output[BUFSIZ];
    size_t      outputLeft=BUFSIZ;
    iconv(converter, input, inputLeft, &output, &outputLeft);
Masaki Katakai wrote:
> Sorry, 4th and 5th arguments should not be NULL.

Mhhh, can you file a bug against the iconv(3) manual page, please ? Such a
comment in the NOTES would be nice...
> Mhhh, can you file a bug against the iconv(3) manual page, please ? Such a
> comment in the NOTES would be nice...

It's not a spec. This is the bug of Solaris iconv().

Roland, could you wait? I'm now asking our OS engineer
to verify that can reset the state correctly.
Comment on attachment 87488 [details] [diff] [review]
Better patch for 2002-06-12-08-trunk per cls's comments

Ouch! Now I see what's going wrong... you want to reset the converter (e.g.
legal usage of |xp_iconv()|) ...
Attachment #87488 - Flags: needs-work+
I moved all reset-converter xp_iconv()-calls to a new function
(|xp_iconv_reset()|) to avoid that we have to touch tons of lines in the
future...
Attachment #87488 - Attachment is obsolete: true
Roland: latest patch looks good to me.  if you don't mind i'd like to keep this
bug assigned to myself since i still have to land the original patch (plus any
fixups) on the branch.

reassigning back to me.
Assignee: Roland.Mainz → darin
Status: ASSIGNED → NEW
Comment on attachment 87495 [details] [diff] [review]
Patch for 2002-06-12-08-trunk

looks good to me.. sr=darin
Attachment #87495 - Flags: superreview+
*** Bug 151523 has been marked as a duplicate of this bug. ***
Darin Fisher wrote:
> Roland: latest patch looks good to me.  if you don't mind i'd like to keep 
> this bug assigned to myself 

NP... :)

> since i still have to land the original patch (plus any fixups) on the branch.

I thought the original patch is already in the 1.0.1-branch...

... let's wait if katakai likes the patch or not... :)
Comment on attachment 87495 [details] [diff] [review]
Patch for 2002-06-12-08-trunk

r=alecf
Attachment #87495 - Flags: review+
1. Can we please wait for katakai's response before checking the patch "in" ?
2. IMHO it may be wise to create a test tool (e.g. "TestNativeCharsetUtils") -
can anyone file a RFE for that, please ? :)
well... hmm... why don't we just check it in since it should fix the tinderbox
orange.  if katakai comes back and has some suggestions on how to further fixup
the iconv code, we can make another revision.  given that we'll need this patch
(or something very similar) on the branch, i think it's worth while to expedite
things a bit.
in addition, this patch just makes the existing code easier to fix, no matter
how we end up fixing it.
I'm having some problems here.  When I try to run the patched Mozilla 1.0 on
older Solarii (either Intel or sparc - doesn't matter) I get Segmentation faults.  

Here's one of the scenarios:
I make a Mozilla 1.0 release build with v4 patch applied on Solaris 2.6 sparc. 
It compiles correctly.  When I run this build on my Solaris 8 multi-language
system, it works correctly, and when the locale is set to ja, I can load the
double-byte files correctly.  HOWEVER - when I run a fresh installation of this
build on my Solaris 2.6 system (that was actually used to build it), it
Segmentation Faults before I can even get the browser loaded.  A build done on
the same machine - without the patch - works just fine.  

I have a theory - my Solaris 2.6 box doesn't have Japanese locales - only a few
European locales.  The same situation is happening with my Solaris Intel boxes.
    It builds on 7, but only runs on 8.  The 7 is Euro only - the 8 is
multi-language.
Paul Pietromonaco wrote:
> I'm having some problems here.  When I try to run the patched Mozilla 1.0 on
> older Solarii (either Intel or sparc - doesn't matter) I get Segmentation 
> faults.

Please try attachment 87495 [details] [diff] [review] - that should fix your problem.
if that patch doesn't do the trick, then can you perhaps supply a stack trace
corresponding to the crash?
The patch is good. Thanks!!

One info:
I've talked with OS engineer who is iconv expert.
iconv(conv, NULL, ...) means two things,

 1) reset the state of converter
 2) put sequence into outputs if necessary if output
   encoding is stateful.

2) For example, output is iso-2022-jp, iconv() does not
put escape sequence at the end of outputs at normal conversion.
At resetting (called by iconv(conv, NULL, ...)), escape
sequence is put into output if necessary e.g. end of kanji
character. So, strictly speaking, we need to look into
outputs even at iconv(conv, NULL, ...). However, I understand
there is no stateful encoding such as iso-2022-jp as OS
platform locale. So the patch is OK.
marking FIXED
Status: NEW → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Roland Mainz wrote:
>
>Paul Pietromonaco wrote:
>> I'm having some problems here.  When I try to run the patched Mozilla 1.0 on
>> older Solarii (either Intel or sparc - doesn't matter) I get Segmentation 
>> faults.
>
>Please try attachment 87495 [details] [diff] [review] - that should fix your problem.

That indeed fixed my problem.  Thanks for the prompt reply! (^_^)
attachment 87495 [details] [diff] [review] fixes the same crash in mozilla 1.1a on solaris 7 for me too.
Thanks!
Hi everyone,

Sorry to pester you - I'm very new to using Bugzilla - but I have a question. 
I've found a pretty severe bug that corrupts your User profile that I believe is
directly related to the patch.  However - it's pretty obscure, and the steps to
reproduce are not common. 

I'm thinking it's probably better to spawn a new bug and let this one stay
Resolved Fixed. However, I'm not sure if that's the right thing to do.  Can
someone let me know, either via this bug, or privately, what action you'd like
me to take?

Thanks in advance for your time,
Paul
paul: it'd probably be best to file a separate bug in this case.  please add a
comment to this bug indicating the bug number so others interested can follow
the bug report.  thx!
can someone please verify that this patch has done the trick?  (i.e., mark it
verified.)  that way we'll be able to land this on the 1.0 branch for mozilla
1.0.1.  thx!!
please checkin to the 1.0.1 branch. once there, remove the "mozilla1.0.1+"
keyword and add the "fixed1.0.1" keyword.
Keywords: adt1.0.1
ylong - can you get this one verified asap? thanks!
Blocks: 143047
Whiteboard: [adt1 RTM] → [adt1 RTM] [ETA 06/21]
I never able to reproduce this bug on my linux RH7.2, so I'm afraid I'm not the
right person to verify it.  Since the bug was filed for Sun Sloaris. 
Katakai-san, or some one else who had this problem, could you please help us to
verify this? thanks a lot!
QA Contact: ylong → katakai
Sure. I'll try.
add kevin.zhou@sun.com in the cc list
ylong- the problem cannot be reproduce on linux. It can be reproduce on Solaris
or other unix. However, since the patch is big and touch the code linux will be
use and we want to make suer the patch fix the solair problem without making
Linux worst, we need your help to make srue the current trunk with the patch do
not have additional problems on Linux. While you cannot say "I test it and it
fix the problem", it is possible for you to say "I test it and it do not
introduce problem on Linux but I cannot say anything about Solaris or other unix"

We can ask kevin.zhou@sun.com to allocate QA on Solaris testing. Kataki - can
you drive that? kevin is locate in china. kevin- can you verify the trunk build
that cannot reproduce the problem after the fix. if you can verify that, then we
can ask for landign of branch.
> ylong- ... However, since the patch is big and touch the code linux 
> will be use and we want to make suer the patch fix the solair problem 
> without making Linux worst, we need your help to make srue the current > trunk
with the patch do not have additional problems on Linux. ...

Yes, when I said "I nerver able to reproduce this problem" meant before and
after the checked-in.  But for verify this particular bug, we need some one had
this problem before.

I'll definitely add comments(or maybe file a seperate bug also) here in case I
found any linux regression problem.
marking VERIFIED per previous few comments.
Status: RESOLVED → VERIFIED
Adding adt1.0.1+ on behalf of the adt for checkin to the 1.0 branch.  Please get
drivers approval before checking in. When you check this into the branch, please
change the mozilla1.0.1+ keyword to fixed1.0.1
Keywords: adt1.0.1adt1.0.1+
Whiteboard: [adt1 RTM] [ETA 06/21] → [adt3 RTM] [ETA 06/21]
ylong- you try seveal differnt locale , right?
verified on Solaris 9, Chinese locale. trunk build.
Still does not work in my machine.
I am using trunk of:

       Mozilla 1.1a+
       Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.1a+) Gecko/20020620

My env is:

       Sun Microsystems Inc.   SunOS 5.8       Generic February 2000
       $ locale
       LANG=zh
       LC_CTYPE="zh"
       LC_NUMERIC="zh"
       LC_TIME="zh"
       LC_COLLATE="zh"
       LC_MONETARY="zh"
       LC_MESSAGES="zh"
       LC_ALL=
       $ 

Anto: yours?
Jay, which build id is displayed on browser title?
I'm using 2002061922 of mozilla-sparc-sun-solaris2.6.tar.gz.
It works fine fo me in ja/ja_JP.PCK(SJIS). Also I've tried in zh
locale and I don't see any problem.

Could you try truss and check correct iconv.so is being picked up?
In my Solaris 8, it seems that correct modules are picked and
zh filename can be displayed on filepicker.

9860:   open("/usr/lib/iconv/gb2312%UTF-8.so", O_RDONLY) = 9
9860:   open("/usr/lib/iconv/UTF-8%gb2312.so", O_RDONLY) = 9
Katakai:
I am using debuging build. that means, I checkout trunk code, and build using
Forte u2, so the Build ID: is 00000000, the date of the code is 6/20.

What is the difference if I download releasing build from mozilla?
I verify in Simplified Chinese/Traditional Chinese/Korean and UTF-8 locales on
Solaris 9:

The mozilla trunk 2002061922 works fine.

but in ko.UTF-8 locale, file name seems not OK. 
I had verified this bug on solaris8 with mozilla-sparc-sun-solaris2.6.tar.gz.  
It is seems ok.  I have tested  GBK&ZH locale on Solaris8, all chinese fonts 
been showed correctly. I found another problem is the chinese file name can not 
been showed correctly in url bar, but it is not due to this bug.
We, Sun China and Japan team are now testing 2002061922 trunk
in all possible locales (including CJK,Thai,Russian and Euro...)
on Solaris 8 and Solaris 9.

We've completed the verification of the locales *except* ko.UTF-8
locale of Solaris 9. The results are OK for now.

We'll let you know the result of ko.UTF-8 later.
In comment #191:
> ylong- you try seveal differnt locale , right?
Yes, I tried on my linux RH7.2: 
en_US, C, ja_JP.eucJP, zh_CN, zh_TW.Big5, ko_KR.eucKR.
Some locales(like ja_JP.SJIS...etc.) I can not try cause they falled back to C.
It looks pretty good from the test result Sun Japan and Sun China did . Cool we
really have a global team :) darin, since we got all approval already and none
of the testing result show red flag, let's land it asap :) :) :) thank you very
very very very much :) Great job. 
actually, there could be problems with UTF-8.  i think we'd try to convert from
UTF-8 to UTF-8 to UCS-2.  instead of directly from UTF-8 to UCS-2, and vice
versa.  that might be causing problems with UTF-8, so i'm very anxious to hear
back from folks about the UTF-8 verfication results.  thx!!
fixed1.0.1
Blocks: 146292
No longer blocks: 141008
Blocks: 153562
It seems this change has caused a regression on solaris 8 (x86 & sparc).

The user profile directory in the "~/.mozilla/appreg" file gets truncated by
one character, in case the browser crashes.  This problem didn't exist in the
mozilla 1.0 release.


See also [Bug 153562] crash trashes user directory, requires new profile

Comment #180 for this bug also seems to hint at this profile problem (although
I'm not 100% sure, since paul didn't mention details).




Problem is that the solaris iconv conversion module adds a 'byte order mark'
BOM unicode charater (uFEFF) in front of the first character of the first
converted UCS-2 unicode string.  This breaks the assumption made in 
NS_CopyNativeToUnicode(), that the UCS-2 output string never contains more
unicode characters then the native input string.

Question is: should bug 147333 be reopened,  or shall we track the profile
problem for solaris on bug 153562?
> Problem is that the solaris iconv conversion module adds a 'byte order mark'
> BOM unicode charater (uFEFF) in front of the first character of the first
> converted UCS-2 unicode string.

Is this the expected iconv() behavior?

BOMs are meant to be used as the first character of a Unicode file or where
specified in a protocol.  It seems odd that iconv() would prepend BOMs to
strings since it has no idea how or where that string will be used.

What do other iconv() implementations do?
> BOMs are meant to be used as the first character of a Unicode file or where
> specified in a protocol.  It seems odd that iconv() would prepend BOMs to
> strings since it has no idea how or where that string will be used.

Note that the solaris iconv() does not prepend BOMs for every converted string;

the BOM seems to be prepended *exactly* *once* during the first iconv() call
after an iconv_open of any conversion that outputs UCS-2.

Note also, that solaris 8 can convert into UCS-2, UCS-2LE and UCS-2BE.	The
BOM is only inserted when we convert into UCS-2.  On Solaris 8 SPARC, iconv ->
UCS-2 produces big endian unicode characters while on Solaris 8 Intel it
produces little ending unicode characters.

With UCS-2LE or UCS-2BE the BOMs are not inserted.


See also the attached small test program.  It converts the five character
ISO8859-1 string 'Hello' into UCS-2 using iconv().  On Solaris 8 SPARC, I get
different UCS-2 output for the first and second conversion:

% uname -a
SunOS foobar 5.8 Generic_108528-13 sun4u sparc SUNW,Ultra-60
% ic2
converting 'Hello' from ISO8859-1 -> UCS-2
converted 5 bytes input to 12 bytes output
fe ff 00 48 00 65 00 6c 00 6c 00 6f 
converted 5 bytes input to 10 bytes output
00 48 00 65 00 6c 00 6c 00 6f 


Btw. this happens for every conversion to UCS-2, for example the UTF-8 -> UCS-2

conversion module shows the same behaviour:

% ic2 UCS-2 UTF-8
converting 'Hello' from UTF-8 -> UCS-2
converted 5 bytes input to 12 bytes output
fe ff 00 48 00 65 00 6c 00 6c 00 6f 
converted 5 bytes input to 10 bytes output
00 48 00 65 00 6c 00 6c 00 6f 


And note, that solaris 2.6 behaves differently.  It does not offer conversion
to the endian-specific variants UCS-{2,4}{BE,LE},  and converting to UCS-{2,4}
does not add a BOM:


% uname -a
SunOS xxxx 5.6 Generic_105181-26 sun4m sparc sun4m
% ic2 UCS-2 UTF-8
converting 'Hello' from UTF-8 -> UCS-2
converted 5 bytes input to 10 bytes output
00 48 00 65 00 6c 00 6c 00 6f 
converted 5 bytes input to 10 bytes output
00 48 00 65 00 6c 00 6c 00 6f
Interesting...
A couple of possible hacks for Solaris8+:
(1) Do a spurious iconv() and then throw away the results with the BOM.
(2) Call iconv() with the appropriate UCS-2LE or UCS-2BE instead of UCS-2

Is this a problem on platforms other than Solaris8?
> A couple of possible hacks for Solaris8+:
> (1) Do a spurious iconv() and then throw away the results with the BOM.
> (2) Call iconv() with the appropriate UCS-2LE or UCS-2BE instead of UCS-2

I've prepared patches for both ideas. See bug 153562.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: